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ABSTRACT 

This  work  investigates  issues  related  to  distribution  of  low-bit-rate  video  within 
the  context  of  a  teleconferencing  application  deployed  over  a  tactical  ATM  network.  The 
main  objective  is  to  develop  mechanisms  that  support  transmission  of  low  bit  rate  video 
streams  as  a  series  of  scalable  layers  that  progressively  improve  quality.  The  hierarchical 
nature  of  the  layered  video  stream  is  actively  exploited  along  the  transmission  path  from 
the  sender  to  the  recipients  to  facilitate  transmission. 

A  new  layered  coder  design  tailored  to  video  teleconferencing  in  the  tactical 
environment  is  proposed.  Macroblocks  selected  due  to  scene  motion  are  layered  via 
subband  decomposition  using  the  fast  Haar  transform.  A  generalized  layering  scheme 
groups  the  subbands  to  form  an  arbitrary  number  of  layers.  As  a  layering  scheme  suitable 
for  low-motion  video  is  unsuitable  for  static  slides,  the  coder  adapts  the  layering  scheme 
to  the  video  content.  A  suboptimal  rate  control  mechanism  that  reduces  the  k- 
dimensional  rate-distortion  problem  resulting  from  the  use  of  multiple  quantizers  tailored 
to  each  layer  to  a  1 -dimensional  problem  by  creating  a  single  rate-distortion  curve  for  the 
coder  in  terms  of  a  suboptimal  set  of  /:-dimensional  quantizer  vectors  is  investigated. 
Rate  control  is  thus  simplified  into  a  table  lookup  of  a  codebook  containing  the 
suboptimal  quantizer  vectors.  The  rate  controller  is  ideal  for  real-time  video  and  limits 
fluctuations  in  the  bit-stream  with  no  corresponding  visible  fluctuations  in  perceptual 
quality. 

A  traffic  smoother  prior  to  network  entry  is  developed  to  increase  queuing  and 
scheduler  efficiency.  Three  levels  of  smoothing  are  studied:  frame,  layer,  and  cell 
interarrival.  Frame  level  smoothing  occurs  via  rate  control  at  the  application. 
Interleaving  and  cell  interarrival  smoothing  are  accomplished  using  a  leaky  bucket 
mechanism  inserted  prior  to  the  adaptation  layer  or  within  the  adaptation  layer. 
Simulations  indicate  that  smoothing  lowers  bandwidth  requirements  for  a  given  quality  of 
service  and  that  interleaving  cells  from  different  layers  enhances  the  effectiveness  of 
priority-based  scheduling  schemes. 


A  new  cell-scheduling  scheme  is  proposed  that  exploits  the  layered  video 
hierarchy  to  allow  more  graceful  degradation  in  visual  quality  during  periods  of  cell  loss. 
Quality  of  service  at  the  connection  level  is  maintained  using  an  optimal  scheduling 
algorithm  that  accounts  for  the  cell  loss  rate  and  cell  transfer  delay  requirements  for  each 
connection.  Within  the  connection,  a  prioritization  scheme  denies  service  to  cells  from 
lower  priority  layers  during  periods  of  congestion  and  cells  deemed  non-viable  due  to 
group  of  blocks  (GOB)  corruption  to  increase  the  probability  that  cells  from  higher 
priority  layers  are  transmitted.  Simulations  indicate  that  protecting  higher  priority  layers 
requires  accepting  a  corresponding  decrease  in  throughput.  Depending  on  the 
prioritization  scheme  used,  cell  loss  rates  for  the  base  video  layer  can  either  be 
maintained  at  the  desired  rate  or  improved  by  an  order  of  magnitude  relative  to  no 
prioritization.  Cell  discarding  allows  the  scheduler  to  recover  bandwidth  from  non- viable 
cells  although  the  impact  within  the  connection  depends  on  the  service  discipline.  As  the 
GOB  size  increases,  cell  discarding  is  improved  if  cells  from  different  layers  are 
interleaved  to  reflect  spatial  dependency  between  the  base  layer  and  the  enhancement 
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I.  INTRODUCTION 

Multimedia  applications  support  the  processing,  transmission,  and  control  of 
streams  of  related  audio-visual  signals  including  text,  images,  audio,  and  video  data  [1]. 
Common  examples  include  streaming  applications,  such  as  video-on-demand  (VOD),  and 
interactive  applications,  such  as  video  and  audio  teleconferencing.  Multimedia 
applications  offer  difficult  challenges  for  network  design  due  to  the  need  to  bound  data 
loss  in  transmission,  the  need  to  limit  transmission  delays,  and  the  need  for  synchronizing 
the  related  streams  comprising  a  multimedia  session.  In  particular,  video 
teleconferencing  (VTC)  demonstrates  the  great  potential  of  multimedia  applications  to 
deliver  information  but,  at  the  same  time,  poses  difficult  distribution  problems  for  the 
hosting  network. 

VTC  plays  an  important  role  in  the  U.S.  Navy's  Information  Technology  for  the 
21^'  Century  initiative  (IT-21).  IT-21  seeks  to  transform  the  current  platform  centric 
approach  to  warfighting  to  a  network  centric  approach  that  leverages  information 
superiority  with  current  and  planned  smart  weapons  [2].  At  the  battlegroup  level, 
deploying  VTC  over  a  tactical  network  that  links  individual  units  via  a  wireless  link 
offers  several  benefits  including  collaborative  planning,  remote  maintenance,  distance 
learning,  and  telemedicine.  However,  a  tactical  network  thus  envisioned  present 
constraints  not  typically  present  in  traditional  wireline  networks.  The  tactical  network 
may  be  viewed  as  an  internetwork  of  shipboard  wireline  local  area  networks  (LANs) 
interconnected  by  a  wireless  channel.  The  wireless  channel  serves  as  a  bottleneck  within 
the  tactical  network  and  constrains  both  the  available  bit  rate  and  transmission  quality. 
Each  of  these  constraints  impacts  the  perceived  quality  of  any  deployed  VTC  application. 

A.    BACKGROUND 

This  section  provides  additional  information  on  the  IT-21  initiative  and  VTC  to 
provide  a  context  for  the  problem  scenario  in  the  next  section.  Additionally,  the  type  of 
service  required  to  support  VTC  is  briefly  considered. 


1.  IT-21 

A  brief  examination  of  the  17-21  initiative  is  valuable  for  determining  the  baseline 
network  architecture  to  host  a  tactical  VTC  application.  The  goal  of  17-21  is  to  link  all 
U.S.  Forces  together  in  a  network  that  enables  the  transmission  of  voice,  video  and  data 
from  individual  workstations  seamlessly  to  both  local  and  remote  users  [2] [4].  The 
anticipated  network  is  heterogeneous  and  allows  connectivity  among  wireline  LANs 
using  both  wireless  and  satellite  communication  links.  All  networks  and  interfaces  are  to 
use  commercial  off-the-shelf  (COTS)  technology  built  to  current  industry  standards. 

Focusing  on  the  battlegroup  level,  shipboard  LANs  are  to  have  ATM  backbones. 
Individual  workstation  connectivity  is  provided  initially  via  100  Mbps  Fast  Ethernet  with 
a  future  transition  to  direct  ATM  connections.  Connectivity  among  units  of  the 
battlegroup  is  provided  by  EHF  links  with  a  minimum  data  rate  of  128  kbps  to  support 
messaging  and  maintain  a  common  tactical  picture.  However,  to  support  multimedia 
applications,  such  as  VTC  or  collaborative  planning  with  high  resolution,  early 
projections  indicate  that  a  minimum  data  rate  of  1280  kbps  is  required. 

2.  Video  Teleconferencing 

Teleconferencing  systems  can  be  broken  into  three  categories:  audio-only,  audio 
and  graphics,  and  video.  VTC  is  an  interactive  application  requiring  low  network 
latency,  bounded  delay  jitter,  and  low  cell  loss  to  both  preserve  audiovisual  quality  and 
maintain  the  sense  of  interactivity.  In  addition,  careful  synchronization  is  required 
between  the  audio  and  video  streams.  While  communication  may  be  unicast  as  in  peer- 
to-peer  applications,  the  more  challenging  problem  of  multicast  communication  is 
considered  here.  As  such,  each  sender  is  assumed  to  transmit  to  multiple  receivers  in  the 
multicast  group.  In  turn,  the  multicast  group  consists  of  some  combination  of  active 
participants  that  receive  and  transmit  and  passive  participants  that  receive  only.  This 
situation  is  illustrated  in  Figure  I.l. 

Since  video,  and  audio  to  a  lesser  degree,  is  bandwidth  intensive,  signals  are 
compressed  prior  to  transmission  and  trade  some  reduction  in  quality  for  a  reduction  in 
bandwidth.  Multimedia  communications,  therefore,  require  dedicated  terminals,  which 


capture  and  prepare  signals  for  transmission  over  the  network  and  reconstruct  received 
streams  by  decompressing  and  resynchronizing  different  streams  as  required. 
Commercial  VTC  applications  have  been  facilitated  by  the  emergence  of  ITU  standards 
for  multimedia  terminals  [3].  Each  standard  targets  a  bandwidth  range  (and  thus  quality), 
a  particular  networking  standard,  and  incorporates  a  family  of  associated  standards  to 
support  the  required  audio  and  video  compression,  control  signals,  and  network  interface. 


Video 


Workstation 


Figure  I.l:  Simple  VTC  Multicast  with  Two  Active  and  Two  Passive  Nodes. 


3.  Multimedia  Applications  and  QoS 

Quality  of  service  (QoS)  denotes  a  set  of  one  or  more  parameters  describing  the 
level  of  service  granted  to  an  application  by  a  network  or  required  from  the  network  by 
the  application  for  acceptable  performance.  Many  possible  QoS  parameters  exist,  but  the 
typical  parameters  employed  are  maximum  allowable  delay,  delay  variation  or  jitter,  and 
cell  loss  rates.  The  QoS  requirements  for  a  particular  multimedia  application  depend  on 
the  types  of  information  transmitted  and  the  manner  in  which  the  information  is 
compressed  or  packaged  for  transmission.  More  generally,  multimedia  applications  are 
characterized  by  the  manner  in  which  information  is  distributed,  the  degree  of 
interactivity,  and  the  type  of  information  transported  [1]. 


Multimedia  communications  are  either  unicast  or  multicast.  Unicast  represents 
peer-to-peer  communication  while  multicast  represents  w-to-^  communication,  where  m 
ranges  from  1  to  n.  Unicast  examples  include  client-server  applications,  such  as  VOD. 
Multicast  examples  include  distance-learning  and  tele-remote  conferencing.  As  will  be 
discussed  later,  the  manner  of  communications  between  the  source  and  recipients  may 
complicate  information  delivery  depending  on  the  type  of  network  employed. 

Multimedia  applications  are  either  interactive  or  streaming.  Streaming 
applications  are  either  unicast  or  multicast  and  are  channel  asymmetric:  significant 
content  flows  in  only  one  direction.  Interactive  applications  tend  to  have  content 
flowing,  in  part,  in  at  least  two  directions  although  the  flow  may  not  be  fully  symmetric. 
Streaming  applications  usually  do  not  require  strict  bounds  on  delay  but  are  sensitive  to 
delay  jitter.  Interactive  applications  usually  require  strict  bounds  on  both  or  not  at  all, 
depending  on  the  information  content. 

The  information  flow  for  multimedia  applications  is  either  continuous  or 
intermittent.  Applications  with  intermittent  flow  are  not  usually  delay  sensitive  but  tend 
to  tolerate  cell  loss  poorly.  Examples  include  text  files,  still  images,  and  graphics.  For 
applications  with  continuous  flow,  such  as  video  and  audio,  delay  sensitivity  depends  on 
whether  the  application  involved  is  interactive  or  streaming  as  mentioned  above.  Some 
cell  loss  is  acceptable  for  continuous  flows  although  the  degree  depends  on  the 
information  source  as  well  as  the  amount  of  compression  involved. 

B.         PROBLEM  SCENARIO 

This  section  lays  out  scenario  parameters  for  a  tactical  shipboard  VTC  application 
and  discusses  difficulties  with  preserving  video  quality  using  traditional  video  coders 
over  heterogeneous  networks. 

1.  Target  Scenario 

Using  the  IT-21  requirements  as  a  baseline,  the  battlegroup  tactical  network  is 
assumed  to  be  a  hybrid  wireline/wireless  ATM  network.  Shipboard  networks  employ  an 
ATM  backbone  and  provide  complete  ATM  connectivity  to  the  desktop,  offering  either 


native  ATM  services  or  legacy  LAN  emulation  over  ATM.  An  ATM  wireless  network 
provides  connectivity  within  the  battlegroup.  A  centralized  control  station,  usually  the 
capital  ship  within  the  battlegroup,  may  manage  access  to  the  wireless  network.  This 
network  is  illustrated  in  Figure  1.2. 

Intrinsically,  this  arrangement  offers  asymmetric  bandwidth  depending  on 
whether  a  connection  remains  shipboard  or  is  ship-to-ship.  Given  the  current  capabilities 
of  ATM  network  interface  cards  (NICs),  workstations  can  expect  a  maximum  bandwidth 
of  10-25  Mbps  with  correspondingly  higher  bandwidths  across  the  backbone.  However, 
given  current  technology,  wireless  data  rates  are  far  more  limited.  A  reasonable 
assumption  is  a  bandwidth  of  at  least  1  Mbps,  a  value  well  within  the  capability  of 
commercially  available  technologies,  such  as  Multichannel  Multipoint  Distribution 
Service  (MMDS)  broadband  wireless  transmission.  MMDS  offers  line-of-sight  (LOS) 
service  in  the  2. 1  GHz  to  2.7  GHz  band  with  data  rates  up  to  1 .5  Mbps.  Satellite  links 
complete  the  connectivity  to  land-based  LANs  but  are  not  considered  further  here  since 
their  high  latency  precludes  satisfactory  performance  for  interactive  multimedia. 

The  maximum  quality  of  any  multimedia  application  depends  in  part  on  available 
bandwidth  (network  services  also  play  an  equally  important  role).  While  the  network 
described  here  provides  for  high  bandwidth  aboard  individual  units,  networking  between 
units  is  constrained  by  the  wireless  interface.  Thus,  deploying  a  tactical  VTC  application 
at  the  battlegroup  level  requires  operating  within  this  bandwidth  constraint. 

To  provide  a  basis  for  the  work  presented  here,  a  set  of  reasonable  requirements 
for  low-bit-rate  tactical  VTC  is  proposed  below  using  international  standards  where 
possible  to  keep  within  the  spirit  of  IT-21.  Given  the  bandwidth  constraints,  both  the 
audio  and  video  streams  must  be  compressed.  Toll  quality  speech  demands  far  less 
bandwidth  than  video  and  can  be  reasonably  limited  to  8  kbps  or  less  using  code  excited 
linear  prediction  (CELP)  speech  coding  [5].  Video  bandwidth  requirements  depend  on 
the  desired  resolution,  frame  rate,  color  depth,  and  the  permissible  tradeoff  between 
compression  gain  and  perceptual  quality.  Current  low-bit-rate  ITU  multimedia  standards, 
such  as  H.320  and  H.324,  use  low  resolutions  and  frame  rates  to  enable  acceptable  video 


quality  [3].  Using  these  standards  as  a  guideline,  the  tactical  VTC  transmits  video  signals 
at  10  fps  using  the  Quarter  Common  Intermediate  Format  (QCIF)  with  a  resolution  of 
176x144  pixels  and  targets  bit  rates  in  the  range  of  64-96  kbps.  The  primary  color  depth 
supported  is  8-bit  grayscale  although  4:2:0  sub-sampled  24-bit  color  [6]  is  a  possible 
option.  These  requirements  are  summarized  in  Table  I.l. 
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Figure  1.2:  Hybrid  ATM  WirelineAVireless  Network. 


VTC  Stream 


Parameter 


Value 


Video 


Audio 


Bandwidth 
Resolution 
Frame  Rate 
Color  Depth 
Bandwidth 


64-96  kbps 

176x144  (QCIF) 

10  fps 

8-bit  gray/4:2:0  24-bit  color 

<  8  kbps 


Table  1,1:  Tactical  VTC  Multimedia  Requirements. 


2.  Video  Compression  and  Robustness 

Given  the  parameters  in  Table  1. 1 ,  a  video  compression  gain  of  approximately  3 1 
to  1  is  required  to  transmit  8-bit  grayscale,  assuming  an  average  available  bit-rate  of  64 
kbps.  Such  gains  are  easily  within  the  capability  of  current  video  coding  standards,  such 


as  H.263  and  MPEG- 1/2.  However,  traditional  video  compression  schemes  are  not 
particularly  suitable  for  multicast  transmission  over  packet-based  networks. 

Video  codecs  compress  the  original  video  stream  by  removing  the  least 
perceptually  relevant  content  and  by  encoding  only  the  differences  between  successive 
frames  caused  by  motion.  Unfortunately,  packet-based  networks  invariably  drop  packets 
due  to  congestion,  even  in  network  architectures  offering  QoS  guarantees,  such  as  ATM 
networks.  Due  to  the  high  compression  gains  required  for  transmission,  each  packet 
contains  a  significant  amount  of  information.  The  loss  of  a  single  packet  corrupts  a 
portion  of  a  frame  or  an  entire  frame  depending  on  the  decoder's  ability  to  resynchronize 
with  the  incoming  bit  stream  [7].  With  motion  compensation,  any  visual  error  artifacts 
introduced  may  persist  for  many  frames  past  the  initial  point  of  corruption  (until  the  next 
I-frame  in  an  MPEG  stream  and  possibly  indefinitely  in  H.263  [8]).  The  effect  of  packet 
losses  grows  more  significant  as  bit  rate  decreases. 

The  problem  of  packet  losses  may  be  mitigated  within  the  network  or  at  the 
application  layer.  Within  the  network,  appropriate  QoS  guarantees  can  reduce  cell  losses 
to  a  level  such  that  any  quality  degradation  due  to  transmission  errors  is  acceptable. 
However,  the  required  cell  loss  rates  can  be  quite  small,  on  the  order  of  10"^,  which 
requires  a  large  allocation  of  bandwidth  to  achieve.  Two  common  approaches  to 
improving  error  robustness  at  the  application  level  are  to  use  codecs  without  motion 
compensation,  such  as  Motion-JPEG  [9],  or  to  vary  the  bit  rate  in  response  to  the 
estimated  degree  of  congestion  within  the  network.  Motion-JPEG  compresses  each 
frame  individually,  thereby  greatly  improving  robustness  since  visual  artifacts  are 
confined  to  the  affected  frame.  However,  robustness  comes  with  lower  compression 
gains,  and,  therefore,  Motion-JPEG  delivers  unacceptable  quality  at  low  bit  rates.  If  the 
source  coder  is  controllable  [11],  network  feedback  reports  can  be  used  to  modify  the 
demand  placed  on  the  network  by  changing  the  quality  of  video  transmission.  While  this 
approach  provides  no  inherent  improvement  in  the  error  resilience  of  the  video  stream, 
but  it  does  try  to  mitigate  the  effects  of  congestion  on  the  received  video  stream. 


However,  designing  a  scheme  for  controlling  the  source  rate  is  difficult  when 
multicast  transmission  over  a  heterogeneous  network  is  considered.  A  heterogeneous 
network  may  be  defined  as  one  in  which  end-users  are  stratified  by  available  bandwidths 
and  processing  and  display  capabilities  [12].  Using  feedback  to  monitor  congestion 
within  the  network  and  then  making  appropriate  changes  to  the  outgoing  video  stream 
becomes  problematic  as  multicast  group  size  increases  or  as  the  network  topology  grows 
more  complex.  Feedback  messages  may  potentially  add  to  congestion  depending  on  the 
periodicity  of  transmission.  More  importantly,  since  each  user  represents  a  different  path 
through  the  network,  each  connection  potentially  experiences  a  different  level  of 
congestion.  The  controllable  application  is  faced  with  a  quandary  in  responding  fairly  if 
only  a  small  number  of  members  within  the  multicast  group  are  experiencing  congestion. 
Stratification  poses  a  further  problem  during  transmission  of  real-time  video  since  each 
user  has  different  expectations  and  tolerances  with  regard  to  video  quality.  Users  with 
high  bandwidth  expect  high  quality  video  while  users  with  low  bandwidth  are  generally 
satisfied  with  less.  Meeting  the  varied  expectations  with  a  single  video  stream  is  clearly 
impractical  and  transmitting  multiple  video  streams  with  gradations  in  quality  requires 
greater  bandwidth. 

C.         DISSERTATION  OBJECTIVES 

Given  the  interest  in  deploying  VTC  applications  over  tactical  networks  such  as 
those  envisioned  by  US  Navy's  IT-21  initiative,  distributing  the  video  stream  while 
maintaining  acceptable  quality  involves  reconciling  the  requirements  of  multimedia 
applications  with  the  capabilities  of  tactical  networks.  As  discussed  in  the  previous 
section,  video  is  bandwidth  intensive  and  highly  sensitive  to  transmission  errors.  A 
tactical  network  may  be  characterized  as  low  bit  rate,  unreliable,  and  heterogeneous. 
Solving  the  distribution  problem  solely  in  terms  of  coder  design  or  network  design  is  less 
effective  than  developing  a  unified  solution  that  reaches  across  the  application-network 
boundary. 


Accordingly,  this  dissertation  investigates  issues  related  to  distributing  low-bit- 
rate  video  within  the  context  of  a  teleconferencing  application  deployed  over  a  tactical 
ATM  network.  The  main  objective  is  to  develop  mechanisms  that  support  transmission 
of  low-bit-rate  video  streams  as  a  series  of  scalable  layers  that  progressively  improve 
quality.  These  mechanisms  exploit  the  hierarchical  nature  of  the  layered  video  stream 
along  the  transmission  path  from  the  sender  to  the  recipients  to  facilitate  transmission. 
Specifically,  the  approach  proposed  in  this  dissertation  works  across  the  application- 
network  interface  by  coding  the  video  stream  into  layers,  shaping  the  resulting  layered 
video  stream  prior  to  entry  into  the  network,  and  prioritizing  service  in  accordance  with 
the  relative  perceptual  importance  of  each  layer.  The  resulting  distribution  path  is 
illustrated  in  Figure  1.3. 


Figure  1.3:  Functional  Diagram  for  a  Multicast  VTC  Application. 
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Each  of  these  mechanisms  centers  on  dividing  the  video  stream  into  an 
independently  decodable  base  layer  that  guarantees  a  minimum,  acceptable  level  of 
quality  and  several  enhancement  layers  that  increase  quality  in  a  hierarchical  manner. 
Transmitting  video  in  layers  has  several  inherent  benefits.  The  layered  structure  provides 
a  means  for  implementing  open-loop  congestion  control  by  allowing  recipients  to  drop 
layers  exhibiting  high  packet  loss  rates,  thereby  reducing  network  loading  [12].  Earlier 
work  by  Rhee  and  Gibson  [13]  indicates  that  layered  video  exhibits  improved  resilience 
to  bit  errors  introduced  during  transmission  since  spreading  bit  errors  across  multiple 
layers  has  less  impact  on  the  reconstructed  video. 

Here,  a  new  layered  coder  design  tailored  to  video  teleconferencing  in  the  tactical 
environment  is  proposed.  Specifically,  the  coder  is  optimized  for  VTC  video  scenes 
consisting  of  low  motion  video,  such  as  a  "talking  head,"  and  static  scenes  corresponding 
to  presentation  slides.  The  concession  to  the  tactical  environment  is  an  emphasis  on  low- 
bit-rate  coding,  low-complexity  coding  for  low  delay  and  power  requirements,  and 
inherent  robustness  to  minimize  the  effect  of  packet  losses  and  bit  errors.  Two  major 
problems  are  considered.  The  first  is  the  notion  of  how  to  ejfectively  map  frequency 
content  to  the  requisite  number  of  layers  and  thus  creating  the  required  perceptual 
hierarchy.  A  generalized  layering  scheme  presented  uses  the  fast  Haar  transform  to 
segregate  frequency  content  into  subbands;  these  subbands  are  then  grouped  by 
perceptual  relevance  to  form  the  required  number  of  layers.  However,  a  layering  scheme 
suitable  for  low-motion  video  is  unsuitable  for  static  slides.  Static  slides  place  a  much 
greater  emphasis  on  high-frequency  content,  and  an  appropriate  layering  scheme  is 
included  with  the  coder  design.  The  coder  adapts  to  the  current  video  type  by  shifting  to 
the  correct  layering  structure. 

The  second  problem  is  developing  a  rate  control  scheme  for  the  layered  video 
coder.  Rate  control  is  a  requisite  for  maintaining  a  desired  QoS  level  in  an  ATM 
network,  but  the  use  of  multiple  quantizers  complicates  developing  an  optimal  rate 
controller  appropriate  for  a  real-time  application.  A  suboptimal  rate  control  mechanism 
that  reduces  the  A:-dimensional  rate-distortion  problem  resulting  from  the  use  of  multiple 
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quantizers  to  a  1 -dimensional  problem  by  creating  a  single  rate-distortion  curve  for  the 
coder  in  terms  of  a  suboptimal  set  of  ^-dimensional  quantizer  vectors  provides  a  more 
appropriate  alternative.  Rate  control  can  thus  be  simplified  into  a  table  lookup  of  a 
codebook  containing  the  suboptimal  quantizer  vectors. 

The  manner  in  which  the  compressed  bit  stream  is  transmitted  to  the  network  has 
a  profound  effect  on  queuing  efficiency  and  therefore  the  bandwidth  required  to  meet  the 
required  QoS.  Smoothing  the  video  traffic  reduces  variation  and  uncertainty  in  the 
arrival  process  and  improves  queuing  efficiency.  Here,  a  traffic  shaper  is  employed  to 
deterministically  smooth  the  entire  stream,  all  layers  included,  to  maximize  queuing 
efficiency.  The  only  drawback  to  smoothing  is  the  insertion  of  additional  delay  in  the 
transmission  path  due  to  the  need  to  buffer  an  entire  encoded  frame  prior  to  transmission. 
However,  a  new  scheme  is  proposed  that  partially  offsets  the  delay.  The  traffic  shaper  is 
also  responsible  for  interleaving  cells  from  each  layer  for  transmission  within  the 
outgoing  stream.  Order  of  arrival  into  the  queue  appears  to  affect  scheduling 
performance  in  priority-based  scheduling  systems  [16]. 

Layered  video  traffic  offers  another  dimension  to  the  scheduling  problem  as  well 
as  an  avenue  for  reducing  the  impact  of  network  congestion  on  the  overall  quality  of  the 
reconstructed  video.  Since  video  is  transmitted  as  a  base  layer  and  a  series  of 
enhancement  layers,  a  hierarchical  priority  system  is  appropriate.  During  periods  of  no 
congestion,  the  layered  video  connection  is  serviced  at  its  required  QoS  without  regard  to 
the  layering  structure.  During  network  congestion,  emphasis  is  placed  on  servicing  the 
most  perceptually  important  layers,  starting  with  the  base  layer,  and  denying  service  to 
the  least  important  layers.  Conceptually,  the  overall  connection  is  granted  a  certain 
bandwidth.  As  cell  loss  increases  due  to  congestion,  the  bandwidth  is  reallocated  to 
support  only  the  most  important  layers. 

However,  the  impact  of  an  individual  cell  loss  may  not  be  viewed  in  isolation. 
Another  factor  to  consider  is  the  temporal  dependence  between  adjacent  cells  in 
ultimately  reconstructing  the  video  sequence.  Both  cell  losses  and  bit  errors  in 
transmission  create  gaps  in  the  incoming  bitstream  causing  the  decoder  to  lose  the 
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synchronization  required  to  recognize  codewords  within  the  stream.  The  decoder  then 
must  parse  forward  within  the  bit  stream  until  a  marker  is  found  to  re-enable 
synchronization.  Therefore,  if  a  cell  is  dropped  from  the  queue,  all  cells  up  to  but  not 
including  the  cell  containing  the  next  marker  are  not  useable  and  will  not  be  decoded. 
This  situation  can  be  exploited  by  reacting  to  cell  loss  by  searching  for  related  cells 
rendered  unusable  and  discarding  them  to  open  scheduling  opportunities  for  other  cells. 

D.         DISSERTATION  ORGANIZATION 

The  dissertation  is  organized  as  follows.  We  start  with  a  discussion  of  general 
multimedia  network  architectures  and  traditional  video  codec  designs.  Next,  the 
elements  for  improving  network  distribution  of  low-bit-rate  video  are  presented.  These 
elements  include  design  of  a  suitable  low-complexity  layered  video  coder  for  tactical 
environments,  a  traffic-shaping  scheme  to  maximize  queuing  and  scheduling  efficiency, 
and  network  scheduling  algorithms  that  provide  QoS  support  for  layered  video  while 
maximizing  perceptual  quality  during  periods  of  congestion. 

Chapter  II  begins  with  an  overview  of  transmission  of  multimedia  traffic  in  both 
the  IP  and  the  ATM  environments.  ATM  and  a  brief  discussion  of  related  ITU  standards 
for  multimedia  terminals  are  covered.  Since  layered  video  follows  a  strict  hierarchy  in 
regard  to  perceptual  importance,  identifying  layers  within  the  network  is  crucial  to 
implementing  priority-based  scheduling.  Also,  as  dropped  cells  may  corrupt  future 
portions  of  the  video  stream,  either  within  a  layer  or  across  all  layers,  identifying  logical 
resynchronization  points  with  the  stream  allows  the  scheduler  to  make  intelligent 
decisions  on  when  to  discard  cells.  Accomplishing  each  of  these  tasks  is  dependent  on 
the  manner  in  which  the  layered  video  stream  is  transmitted  within  an  ATM  network. 
Therefore,  two  approaches  are  examined:  multiplexing  all  layers  over  a  single  virtual 
channel  or  assigning  individual  layers  to  separate  virtual  channels. 

Chapter  III  provides  an  overview  of  hybrid  video  coding  along  with  a  brief 
introduction  to  the  three  components  of  video  coding:  transforms,  quantization,  and 
entropy  encoding.  The  notion  of  wavelet-based  image  compression  is  presented  as  a 
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motivation  for  layered  video  transmission.  Chapter  IV  examines  the  problem  of  layered 
coding  for  both  low-activity  motion  video  and  static  presentation  slides.  A  heuristic 
approach  to  designing  layering  schemes  for  motion  video  is  presented  and  a  particular 
scheme  for  low-bit-rate  video  is  proposed.  As  a  layering  structure  for  motion  video  is 
unsuited  for  static  presentation  slides,  another  layering  structure  is  proposed  emphasizing 
the  greater  perceptual  importance  of  the  high  frequency  content.  The  problem  of  rate 
control  for  layered  coding  is  examined  and  a  simple  open-loop  controller  is  proposed. 

Chapter  V  discusses  the  concept  of  traffic  smoothing  for  increasing  queuing 
efficiency  and  scheduler  performance.  An  integrated  smoothing  scheme  is  proposed  that 
smoothes  traffic  at  three  time  scales:  interframe,  intraframe,  and  across  the  layer 
hierarchy.  Implementation  within  the  context  of  an  ATM  network  is  also  considered. 
Chapter  VI  addresses  the  issue  of  scheduling  layered  video  traffic.  Several  algorithms 
are  proposed  to  maximize  throughput  while  exploiting  the  opportunities  provided  by 
layered  video  to  reallocate  bandwidth  within  a  connection  as  required  to  preserve  the 
higher  priority  layers.  A  cell-discard  policy  is  also  discussed  that  represents  the 
interdependence  of  cells  in  the  traffic  flow,  both  within  a  layer  and  across  layers. 
Simulation  results  illustrating  the  different  algorithms  are  presented  and  discussed. 

Chapter  VII  summarizes  the  significant  contributions  made  in  the  dissertation  and 
provides  concluding  remarks  along  with  a  discussion  of  possible  topics  for  future 
research  in  layered  video  transmission  and  related  areas. 

Appendix  A  presents  the  OPNET  process  models  used  to  validate  the  behavior  of 
the  layered  scheduling  algorithms  presented  in  Chapter  VI.  Appendix  B  presents  a 
suitable  video  traffic  model  used  to  simulate  the  behavior  of  a  rate-controlled  video 
traffic  stream. 
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II.         NETWORK  ARCHITECTURES  FOR  MULTIMEDIA  TRAFFIC 

Before  introducing  the  topics  of  video  compression  and  scheduling,  we  examine 
integrated  services  network  architectures  appropriate  for  video  teleconferencing.  We 
start  by  considering  the  characteristics  of  a  generic  m  to  n  VTC  application.  VTC 
applications  are  inherently  real-time  interactive,  transmit  continuous  media  as  well  as 
discrete,  and  operate  in  multicast  mode.  The  interactive  and  continuous  nature  of  the 
application  suggests  that  strict  bounds  are  required  on  both  delay  and  delay  jitter.  Since 
both  video  and  audio  traffic  are  generally  compressed,  packet  losses  must  be  limited  to 
avoid  excessive  reconstruction  errors.  Summarizing,  the  characteristics  of  VTC 
applications  imply  the  following  requirements:  multicast  support,  QoS  guarantees,  and 
real-time  support.  Based  on  these  requirements,  two  network  architectures  provide  a 
suitable  basis  for  VTC  [3]:  IP-based  networking  in  conjunction  with  RTP  and  ATM 
networking. 

The  purpose  of  this  chapter  is  to  refine  the  networking  scenario  underlying  the 
VTC  application  and  provide  a  context  for  the  work  presented  in  this  dissertation.  While 
multicast  IP  is  briefly  considered,  a  wireless  ATM  network  appears  more  suitable  for 
tactical  VTC  applications  and  is  covered  in  far  greater  detail.  Emphasis  is  placed  on 
describing  ATM's  support  for  different  traffic  types,  QoS  support,  and  connection  setup 
using  a  simple  layered  protocol  model  to  indicate  where  each  level  of  functionality  is 
implemented.  The  ATM  cell  format  is  examined,  and  an  overview  of  ATM  multicast 
implementations  is  presented.  Two  other  related  topics  are  covered  in  some  detail:  a 
brief  introduction  to  wireless  networking,  focusing  on  the  data  link  control  and  physical 
layers,  and  coverage  of  ITU  multimedia  terminal  standards  that  pertain  to  ATM 
networks. 

The  final  issue  considered  is  support  of  layered  video  traffic  within  the  context  of 
established  ATM  networking  standards.  The  first  problem  is  how  to  map  individual 
video  layers  onto  ATM  connections.  All  of  the  layers  may  be  interleaved  over  a  single 


logical  connection  or  transmitted  separately  using  individual  connections.  The 
implications  of  both  approaches  are  considered  and  presented  along  with  the  attending 
advantages  and  liabilities.  The  second  problem  is  facilitating  layer  identification  within 
the  network  to  implement  an  appropriate  scheduling  algorithm.  In  some  cases,  it  is  also 
valuable  to  identify  other  elements  within  each  layer,  such  as  the  positioning  of  frame 
and  group  of  blocks  (GOB)  headers.  Identification  is  complicated  by  the  deliberate 
simplicity  of  the  ATM  cell  header  since  the  user  has  limited  means  for  altering  fields 
within  the  header.  Two  cell  tagging  schemes  are  presented  to  accommodate  this,  one  for 
the  single  connection  case  and  the  other  for  the  multiple  connection  case. 

A.         LEGACY  IP-BASED  NETWORK 

Although  the  TCP/IP  protocol  suite  is  the  dominant  commercial  architecture  for 
internetworking,  TCP/IP  is  not  practical  for  real-time,  multimedia  applications.  Still  IP- 
based  networks  are  so  prevalent  that  incentives  exist  for  working  within  the  limitations 
imposed  by  IP  to  add  some  support  for  real-time  traffic.  The  current  approach  is  to  use 
RTP  over  UDP/DP  to  provide  real-time  support  for  a  video  application  as  illustrated  by 
the  protocol  stack  illustrated  in  Figure  ILL  The  following  paragraphs  consider  both 
TCP/IP  networking  and  RTP  over  IP;  the  latter  is  termed  the  legacy  approach  to  real-time 
networking.  TCP/IP  is  considered  primarily  to  show  how  the  design  decisions,  while 
appropriate  for  the  type  of  traffic  originally  envisioned,  preclude  real-time  support. 
Discussion  of  the  lower  layers  is  deferred  until  later  when  wireless  networking  is 
considered. 
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Figure  II.  1:  IP-Based  Network  Protocol  Stack  for  Real-time  Traffic. 
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IP  and  Multicast  IP 


In  regard  to  real-time  traffic,  IP  is  effectively  neutral.  IP  provides  a  connectionless 
service  to  higher  layers,  providing  only  "best  effort"  delivery  of  datagrams  [19].  Best- 
effort  service  does  not  guarantee  that  any  data  transmitted  will  ultimately  be  delivered  or 
arrive  in  any  particular  order.  Connectionless  service  was  chosen  for  IP  since  datagrams 
traveling  through  different  networks  might  encounter  a  variety  of  protocols.  By  offering 
only  an  unreliable  service,  IP  requires  very  few  services  from  the  constituent  networks 
traversed  by  datagrams.  Any  additional  end-to-end  services,  such  as  a  reliable, 
connection-oriented  service,  are  added  by  transport  layer  protocols,  such  as  TCP,  if 
needed.  However,  best-effort  service  precludes  any  notion  of  QoS  by  definition. 
Although  higher  layers  may  add  additional  functionality  to  control  information  loss,  other 
QoS  parameters,  such  as  delay  and  delay  jitter,  cannot  be  guaranteed.  Even  worse,  if  any 
part  of  a  network  transmission  path  includes  an  IP  network,  no  explicit  QoS  guarantees 
are  possible  regardless  of  the  capabilities  of  the  other  networks  in  the  path. 

IPv4  has  been  extended  through  various  efforts  to  provide  multicast  functionality 
though  support  must  be  regarded  as  experimental  since  currently  most  IP  routers  do  not 
explicitly  provide  multicast  service.  The  best  example  of  multicast  IP  is  MBone 
(multicast  backbone),  an  outgrowth  of  early  multicast  experiments  during  the  formulation 
of  the  IP  multicast  protocol  [20].  Mbone  consists  of  a  virtual  network  of  multicast 
routers  or  mrouters.  Multicast  packets  are  transmitted  point-to-point  between  mrouters, 
using  tunneling  as  necessary  to  traverse  ordinary  routers  [21].    Several  audio  and  video 
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tools  have  been  written  to  take  advantage  of  Mbone,  but  they  are  restricted  primarily  to 
the  Unix  platform  [1].  The  next  generation  of  IP,  IPv6,  explicitly  supports  multicast 
functionality  [18]. 

2.  Transport  Layer  Protocols 

TCP  is  a  transport  protocol  that  provides  the  reliable,  connection-oriented  service 
lacking  in  IP  and  guarantees  sequential  delivery  of  data  to  the  application  layer. 
However,  this  very  service  precludes  the  use  of  TCP  for  real-time  multicast  applications 
[18].  TCP  is  a  point-to-point  protocol;  TCP  connections  are  established  between  two  end 
users.  Reliability  and  sequencing  are  provided  through  a  system  of  acknowledgments 
and  retransmissions  [19].  However,  real-time  applications  have  stringent  delay 
requirements  and  retransmitted  segments  usually  cannot  arrive  in  time  to  provide  a 
benefit.  In  this  case,  retransmissions  merely  waste  bandwidth.  TCP  also  includes  a 
window-based  flow  control  scheme  to  prevent  faster  systems  from  overwhelming  slower 
systems  with  data  and  to  implement  congestion  control  schemes.  However,  the  same 
scheme  impedes  delivery  of  streaming  data. 

For  these  reasons,  UDP  is  favored  for  real-time  traffic,  offering  simple  transport 
layer  access  to  IP  with  low  overhead.  While  UDP  provides  no  explicit  support  for  real- 
time applications,  real-time  traffic  is  not  impeded  as  in  the  TCP  case  [18]. 

3.  Real-time  Transport  Protocol 

RTP  is  a  lightweight  transport  protocol  for  real-time  applications  and  employs 
UDP  for  access  to  both  IP  and  multicast  IP.  RTP  does  not  provide  either  reliable  service 
or  QoS  guarantees  since  the  underlying  IP  layer  precludes  these  services.  RTP  does 
provide  a  framework  of  services  to  the  application  that  allows  the  application  to  monitor 
and  compensate  for  the  actual  QoS  the  network  is  delivering  to  the  recipients.  RTP 
follows  the  concept  of  application-level  framing  [11]  as  posed  by  the  following  scenario. 
The  sending  application  transmits  data  continuously  to  one  or  more  receiving 
applications.  Each  receiving  application  is  able  to  accept  less  than  perfect  delivery  and 
still  continue  operating,  thus  negating  the  need  for  retransmissions.  For  example,  a  video 


decoder  parses  past  missing  data  and  resynchronizes  as  required  to  restart  decoding. 
However,  each  receiver  does  monitor  the  QoS  provided  by  the  network,  in  terms  of 
delay,  delay  jitter,  and  packet  loss,  and  relays  the  information  back  to  the  sender.  Taken 
collectively,  the  feedback  reports  indicate  network  conditions  and  provide  an  opportunity 
for  the  sender  to  adapt  in  hopes  of  obtaining  better  QoS.  If  receivers  report  high  packet 
losses,  indicating  possible  network  congestion,  the  sender  might  move  to  a  lower-quality 
transmission  to  place  a  smaller  demand  on  the  network.  To  benefit  from  RTP,  the 
application  must  be  controllable,  that  is,  able  to  adjust  bandwidth  requirements 
dynamically  as  dictated  by  network  conditions.  A  video  coder,  for  example,  could  reduce 
frame  rate,  resolution,  or  perceptual  quality  [9]. 

RFC  1889  specifies  both  a  data  transfer  protocol,  simply  termed  RTP,  and  a  RTP 
control  protocol,  RTCP  [10].  RTP  supports  either  unicast  or  multicast  transmission  by 
organizing  participating  RTP  entities  into  a  session.  Each  entity  transmits  data  to  the 
session  through  a  single  UDP  port  using  an  application-level  packet  format  defined  by 
the  protocol.  RTP  packet  headers  identify  the  payload  type:  the  media  type  (audio  or 
video)  and  the  format  (G.728  audio  or  H.261  video)  [22].  The  header  also  provides  a 
source  identifier  to  indicate  the  multicast  group  generating  the  data,  a  sequence  number 
for  loss  detection,  and  a  timestamp  for  recording  the  time  the  first  byte  of  data  was 
generated.  The  timestamp  allows  synchronization  among  different  streams. 

RTCP  provides  for  feedback  reports  to  sending  applications  as  well  as  reports  to 
all  members  of  the  multicast  session  [10][18].  Reports  are  transmitted  through  a  separate 
UDP  port  from  RTP  packets.  Receiver  reports  provide  feedback  on  observed  QoS  to  the 
sending  entity.  Sender  reports  are  used  to  alert  participants  when  multiple  source 
identifiers  are  related,  such  as  synchronized  audio  and  video  streams,  and  should  be 
received  together.  Each  session  member  also  periodically  sends  status  reports  that 
collectively  allow  other  members  to  estimate  the  size  of  the  session.  Session  size  is  used 
to  scale  the  report  transmission  rate  to  avoid  overburdening  the  network. 

An  important  point  is  that  RTP  does  not  provide  a  mechanism  or  algorithm  for 
determining  the  manner  in  which  the  sender  interprets  feedback  reports  and  adjusts 


network  demands.  Instead  the  application  must  be  written  to  take  advantage  of  RTP, 
which  suggests  that  RTP  should  be  viewed  as  more  of  an  application  framework  than  a 
complete  networking  protocol  [18]. 

4.  Suitability  of  RTP/IP  for  VTC 

The  introduction  to  this  section  indicated  three  features  required  for  a  networking 
architecture  to  fully  support  video  conferencing.  The  legacy  RTP/EP  network  architecture 
provides  adequate  real-time  and  multicast  support,  yet  the  architecture  falls  short  in  two 
areas.  First,  applications  use  RTCP  receiver  reports  to  mitigate  the  effects  of  congestion. 
With  large  or  heterogeneous  networks,  the  reports  may  vary  significantly  since  each 
receiver  experiences  different  network  conditions.  This  greatly  complicates  the  control 
issue  although  it  is  correctable  to  some  extent  with  RLM  [45].  Second,  and  more 
significant,  the  lack  of  QoS  guarantees  may  lead  to  unsatisfactory  reconstruction  of  the 
audio  and  video  streams.  IP  routers  do  not  guarantee  QoS  since  IP  routing  does  not 
incorporate  the  concept  of  resource  reservation  and  only  provides  service  through 
variants  of  first-come,  first-serve  (FCFS)  scheduling.  The  new  Resource  reSerVation 
Protocol  (RSVP)  has  been  developed  to  provide  support  for  QoS  under  the  proposed 
Integrated  Services  Architecture  (ISA)  [18].  Each  router  running  RSVP  must  implement 
an  admission  control  scheme,  a  scheduling  scheme,  and  be  able  to  classify  packets 
according  to  QoS  requirements.  At  this  point,  RSVP  is  not  widely  implemented  and  its 
capabilities  are  already  duplicated  by  the  more  mature  ATM  network  architecture. 

B.         ATM  NETWORKS 

ATM  grew  out  of  the  desire  to  utilize  the  high  bandwidth  available  from  optical 
fiber  to  create  a  Broadband  Integrated  Services  Digital  Network  (B-ISDN)  that  is  able  to 
support  audio,  video,  and  data  services  within  the  same  network  [27].  In  contrast  to 
TCP/IP,  where  the  end-user  transport  layers  provide  only  reliable  service  and  network 
delivery  is  best  effort,  ATM  networks  provide  QoS  guarantees.  ATM  guarantees  QoS  by 
comparing  the  caller's  QoS  requirements  to  available  network  resources  and  then 
allowing  a  connection  if  sufficient  resources  exist  [18].  Resources  are  reserved  for  the 
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duration  of  the  connection.  ATM  distinguishes  among  several  different  service  or  traffic 
classes,  such  as  the  real-time  and  non-real-time  traffic  at  constant  and  variable  bit  rates, 
and  provides  support  through  a  combination  of  QoS  primitives  and  transport  layer 
adaptation. 

ATM  represents  a  medium  between  PSTN  circuit-switched  networks  and 
connectionless  packet-switched  networks.  ATM  uses  virtual  circuits  to  simplify 
switching  decisions  but  allows  several  connections  to  be  multiplexed  over  a  single 
physical  interface  to  promote  efficient  bandwidth  utilization.  Virtual  circuits  imply 
connection-oriented  service,  but  ATM  also  provides  the  equivalent  of  connectionless 
service  to  support  the  widest  range  of  applications  possible. 

ATM  was  designed  to  support  high  bit  rate  connections,  such  as  OC-3  (155 
Mbps)  and  OC-12  (622  Mbps)  over  fiber  [23]  [24].  The  decision  to  employ  fiber,  a 
physical  medium  with  extremely  small  bit-error-rates  (BER),  allows  ATM  to  minimize 
both  error  and  flow  control  functionality.  Minimizing  these  capabilities  reduces  overhead 
in  processing  ATM  cells  and  decreases  the  header  bits  required  per  cell,  thus  allowing 
fast  switching  speeds  and  efficient  data  transport.  High  speed  switching  is  further 
supported  by  use  of  small,  fixed-length  cells. 

1.  ATM  Protocol  Model 

The  ATM  protocol  model  is  shown  in  Figure  II.2  [23].  The  protocol  model 
consists  of  three  separate  planes:  management,  control,  and  user.  The  management  plane 
provides  management  functions  and  exchanges  information  between  the  control  and  user 
planes.  The  control  plane  deals  with  call  establishment,  connection  control,  and  call 
release.  To  provide  these  functions,  the  control  plane  has  access  to  the  network  and 
separate  signaling  protocols  and  cell  definitions.  The  user  plane  supports  transfer  of  user 
information  by  providing  such  functionality  as  flow  and  error  control,  timestamps  for 
synchronization,  and  sequencing. 

The  user  plane  includes  the  ATM  Adaptation  Layer  (AAL),  the  ATM  layer,  and 
the  physical  layer.  The  AAL  is  a  service  dependent  layer  and  adapts  information  streams 
from  higher  layers  for  transmission  over  ATM.  Example  streams  include  compressed 
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video,  constant  bit  rate  (CBR)  audio,  or  even  IP  datagrams.  Each  has  distinct  service 
requirements.  The  AAL  maps  data  and  service  requirements  from  these  streams  to 
services  provided  by  the  ATM  layer.  The  ATM  layer  provides  data  transport  using  cells 
over  an  end-to-end  logical  connection  and  controls  access  to  the  underlying  physical 
layer. 

The  physical  layer  is  medium  dependent.  The  physical  layer  includes  two 
sublayers:  physical-media  dependent  (PMD)  sublayer  and  transmission-convergence 
(TC)  independent  sublayer  [23].  The  former  deals  with  aspects  that  are  dependent  on  the 
transmission  medium  selected  (e.g.,  bit  timing  and  line  coding).  The  latter  handles  issues 
that  are  independent  of  the  transmission  medium  characteristics,  such  as  error  control  or 
determination  of  cell  boundaries  in  the  physical  layer  payload.  ATM  specifies  SONET,  a 
fiber  standard  that  provides  synchronous  time-multiplied  transmission  at  high  bit-rates,  as 
the  basic  physical  layer  interface.  Other  physical  layer  interfaces,  such  as  UTP  [25] [26], 
are  specified  to  promote  interoperability. 


Figure  II. 2:  ATM  Protocol  Architecture  [23]. 
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2.  Logical  Connections 

End-to-end  connections  in  ATM  are  defined  in  terms  of  a  virtual  channel 
connection  (VCC)  and  a  virtual  path  connection  (VPC).  Figure  11.3  illustrates  the  role  of 
VCCs  and  VPCs  within  an  ATM  network.  VCCs  are  created  dynamically  between  two 
end  users  to  provide  a  unidirectional  channel  for  ATM  cells  carrying  user  data  and  are 
terminated  at  call  release.  Cells  are  carried  in  sequence.  VCCs  are  also  set  up  between 
an  end  user  and  the  network  to  carry  control  signals  and  between  network  nodes  to 
facilitate  network  management  and  routing.  These  connections  cross  the  user-network 
interface  (UNI)  and  the  network-network  interface  (NNI),  respectively. 
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Network  Node:  ATM  Switch 
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Multiplexing  Buifer: 
Admission  Control  is 
Inposed  Here  (the  AP) 


Figure  II.3:  ATM  Network  Configuration  [27]. 

ATM  networks  include  a  higher  level  of  connectivity  in  the  form  of  virtual  paths. 
The  virtual  path  concept  is  motivated  by  the  trend  toward  increasing  bandwidth,  which 
also  increases  the  possible  number  of  connections  a  channel  may  carry.  Compared  to  EP 
networks,  ATM's  circuit-oriented  structure  and  QoS  guarantees  incur  greater  control 
costs.  Since  these  control  costs  scale  with  the  number  of  connections,  virtual  paths 
decrease  cost  by  reducing  the  number  of  connections  managed  by  the  network.  A  VPC 
represents  a  network-defined,  end-to-end  connection  representing  a  set  route  through  the 
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network  and  providing  a  specified  QoS  such  as  bandwidth.  Each  VPC  carries  multiple 
VCCs  with  these  same  end-points,  and  all  associated  cells  are  switched  along  the  same 
path.  Since  most  of  the  work  required  to  establish  a  connection  (reserving  capacity  and 
calculating  routes)  is  performed  when  a  VPC  is  established,  call  setup  time  for  new 
VCCs  is  greatly  reduced. 

3.  ATM  Cell  Format 

ATM  employs  fixed-size  cells  consisting  of  a  5-octet  header  and  a  48-octet 
information  field.  The  cell  header  format  differs  depending  on  whether  the  cell  is 
entering  the  network  (UNI)  or  moving  within  the  network  (NNI).  Figure  11.4  shows  the 
ATM  cell  format  at  the  UNI.  NNI  ATM  cells  do  not  retain  the  generic  flow  control 
(GFC)  field;  instead  they  use  the  bits  to  expand  the  virtual  path  identifier  (VPI)  from  8  to 
12  bits. 

Bit  Position 


7 

6                 5                 4                 3                 2                  1 

0 

Generic  Flow  Control 

Virtual  Path  Identifier 

Virtual  Path  Identifier 

Virtual  Channel  Identifier 

Payload  Type 

CLP 

Header  Error  Control 

Information  Field 
(48  octets) 

Figure  II.4:  ATM  Cell  Format  at  the  UNI  [23]. 

The  GFC  field  is  used  to  control  cell  flow  at  the  UNI  although  application 
remains  an  area  of  active  study  [18].  The  GFC  is  not  carried  end-to-end  and  is 
overwritten  by  ATM  switches  to  expand  the  VPI. 

The  VPI  identifies  a  routing  path  within  the  network.  The  field  width  is  8  bits  at 
the  UNI  and  12  bits  within  the  NNI,  thereby  allowing  a  greater  number  of  virtual  paths 
within  the  network.  The  virtual  channel  identifier  (VCI)  identifies  an  end-to-end  routing 
path  and  functions  similar  to  the  ports  in  TCP  or  UDP. 
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The  payload  type  (PT)  is  a  3-bit  field  used  to  indicate  the  type  of  data  in  the 
information  field.  A  high  order  bit  of  0  indicates  a  user  data  cell;  1  indicates  either  a 
resource  management  (RM)  cell  or  a  cell  carrying  maintenance  information.  The  second 
bit  is  initially  cleared  at  the  UNI.  Within  the  NNI,  a  switch  sets  the  second  bit  whenever 
congestion  is  experienced.  Switches  downstream  can  monitor  this  bit  to  guage  network 
conditions.  The  third  bit  is  the  service  data  unit  (SDU)  type  bit  and  allows  the  user  to 
designate  two  types  of  SDUs.  One  use  of  the  SDU  bit  is  to  implement  different  service 
strategies  for  ATM  cells  based  on  their  content. 

The  cell  loss  priority  (CLP)  field  is  set  by  the  user  to  indicate  the  relative  priority 
of  cells  in  case  congestion  forces  a  switch  to  discard  cells.  A  value  of  0  indicates  higher 
priority,  and  the  cell  should  be  dropped  only  as  a  last  resort;  1  indicates  a  lower  priority 
cell  that  a  switch  may  drop  to  ease  congestion.  As  part  of  call  setup,  the  user  negotiates  a 
contract  with  the  network  and  agrees  to  transmit  data  in  accordance  with  various  traffic 
parameters.  The  user  may  negotiate  separate  contracts  for  CLP  =  0  and  CLP  =  1  traffic. 
Network  switches  also  set  the  CLP  bit  for  any  data  cell  in  violation  of  its  traffic  contract 
even  if  the  switch  has  sufficient  capacity  to  transmit  the  cell.  Subsequent  switches  may 
then  discard  the  cell  as  required. 

ATM  cells  include  an  8-bit  header  error  control  (HEC)  field  calculated  based  on 
the  first  four  octets  of  the  header.  The  HEC  allows  detection  of  errors  and  correction  of 
single-bit  errors.  If  a  multi-bit  error  is  detected,  the  cell  is  discarded.  No  error  detection 
is  provided  for  the  information  field. 

4.  ATM  Service  Classes 

ATM  is  designed  to  support  a  wide  range  of  applications:  from  interactive 
applications,  such  as  video  and  multimedia  conferencing,  to  distribution  services,  such  as 
archive  retrieval  and  document  browsing  [27].  Recall  that  each  application  transmits  a 
sequence  of  cells  through  a  virtual  channel  connection.  Providing  the  desired  QoS  to  a 
new  VCC  depends  on  the  new  connection's  traffic  flow  characteristics  as  well  as  the 
characteristics  of  existing  VCCs.  Traffic  handling,  from  call  acceptance  to  network 
scheduling,  is  therefore  simplified  by  defining  discrete  service  categories.  The  ATM 
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Forum  has  defined  five  ATM  layer  service  classes  as  shown  in  Table  H.  1  [28].  Each 
VCC  established  receives  service  in  accordance  with  one  of  these  categories. 

Interactivity  Service  Class 

■ 

Real-time  service  Constant  bit  rate  (CBR) 

Real-time  variable  bit  rate  (rt-VBR) 

Non-real-time  service  Non-real-time  variable  bit  rate  (nrt-VBR) 

Available  bit  rate  (ABR) 
Unspecified  bit  rate  (UBR) 

Table  II.l:  ATM  Service  Classes  [28]. 

Real-time  services  are  characterized  by  low  tolerance  for  delay  and  delay  jitter. 
Applications  that  involve  human  interactivity,  such  as  video  conferencing,  are  real-time 
since  excessive  delay  degrades  the  perception  of  true  interactivity  and  jitter  impedes  the 
smooth  playback  of  audio  and  video.  The  two  services  defined  for  real-time  service, 
CBR  and  rt-VBR,  are  distinguished  by  variation  in  data  rate.  CBR,  as  expected, 
transmits  data  at  a  fixed  rate  and  is  the  easiest  service  to  support.  Applications  include 
both  compressed  and  uncompressed  data.  Toll-quality  PCM  speech  requires  a  constant 
data  rate  of  64  kbps.  H.261  was  designed  to  support  transmission  over  one  or  more  ISDN 
B  channels  and  compresses  video  at  a  multiple  of  64  kbps.  CBR  is  commonly  employed 
for  uncompressed  applications,  such  as  broadcast  quality  video  conferencing  and 
interactive  audio.  Rt-VBR  applications  have  data  rates  that  are  "bursty"  and  time- 
varying  and  are  characterized  by  a  mean  bit  rate  and  a  peak  bit  rate.  Compressed  video  is 
inherently  VBR  since  compression  gain  naturally  varies  with  each  frame  depending  on 
scene  content  (see  Chapter  III).  Rt-VBR  is  more  difficult  for  networks  to  support  but 
provides  greater  flexibility  than  CBR.  VBR  streams  may  be  statistically  multiplexed 
over  the  same  channel  for  more  efficient  use  of  bandwidth. 

Non-real-time  services  are  intended  for  bursty  traffic  without  stringent 
requirements  on  delay  and  jitter,  thus  giving  a  network  more  flexibility  in  dealing  with 
these  traffic  flows.  Nrt-VBR  applications  generate  VBR  data  that  does  not  require  strict 
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limits  on  delay  but  does  require  some  upper  bound.  Examples  include  banking  and 
airline  transactions  [18].  UBR  service  is  best  effort  service  similar  to  that  provided  by 
IP-based  networks.  UBR  connections  receive  no  dedicated  resources;  bandwidth  is 
provided  dynamically  from  spare  channel  capacity  not  utilized  by  CBR  and  VBR  traffic. 
ABR  improves  upon  UBR's  best  effort  service.  ABR  applications  specify  both  a 
minimum  cell  rate  (MCR)  and  a  peak  cell  rate  (PCR).  At  any  time,  the  network  ensures  a 
fair  allocation  of  resources  among  all  ABR  connections  such  that  each  connection 
receives  at  least  their  MCR,  and  possibly  up  to  the  PCR,  depending  on  available  capacity. 
TCP  connections  and  LAN  traffic  commonly  employ  ABR  service.  Figure  n.5  shows 
how  channel  capacity  could  be  allocated  to  each  service  category. 


100% 


Time 
Figure  II.5:  Bandwidth  Allocation  for  ATM  Service  Categories  [18]. 

At  call  setup,  a  user  requests  service  by  supplying  the  network  with  traffic 
descriptors  that  characterize  the  cell  flow  and  the  required  QoS.  The  exact  parameters 
provided  are  service  dependent.  Traffic  descriptors  allow  the  network  to  determine  if 
sufficient  resources  are  available  to  support  the  connection's  QoS  requirements.  For 
example,  a  user  requesting  rt-VBR  service  must  supply  the  PCR,  the  sustainable  cell  rate 
(SCR),  and  the  maximum  burst  size  of  cells  (MBS).  A  CBR  connection  provides  only 
the  PCR.  The  QoS  desired  is  specified  in  terms  of  cell  delay  variation  (CDV),  maximum 
cell  transfer  delay  (maxCTD),  and  cell  loss  ratio  (CLR).  Real-time  services  require  all 
three  QoS  parameters  be  specified.  Non-real-time  services  do  not  specify  any  QoS 
parameters  except  for  nrt-VBR,  which  specifies  CLR. 
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A  connection  is  accepted  only  if  network  can  reserve  sufficient  resources  while 
maintaining  the  QoS  of  existing  connections.  Assuming  the  connection  is  accepted,  the 
traffic  descriptors  and  QoS  parameters  form  a  traffic  contract  between  the  user  and 
network.  The  user  agrees  to  transmit  in  accordance  with  the  traffic  parameters.  In  turn, 
the  network  guarantees  the  QoS  parameters  for  the  duration  of  the  connection.  Once,  the 
connection  is  active,  the  network  performs  traffic  policing  to  ensure  compliance.  If  the 
user  violates  the  traffic  contract,  perhaps  by  exceeding  the  SCR,  offending  cells  may  be 
tagged  using  the  CLP  bit  or  discarded. 

5.  ATM  Adaptation  Layer  (AAL) 

Referring  back  to  Figure  II. 2,  the  AAL  provides  services  to  applications  or  other 
transfer  protocols  not  found  in  the  ATM  layer.  To  minimize  the  number  of  AAL 
protocols  required,  ITU-T  Recommendation  1.121  defines  four  generic  service  classes', 
A-D,  based  on  three  application  service  requirements  [29]:  bit  rate  (constant  or  variable), 
the  timing  relationship  between  the  source  and  receiver  (required  or  not),  and  the 
connection  mode  (connectionless  or  connection-oriented).  These  service  classes  are 
more  general  than  the  previously  described  ATM  layer  service  classes  and  do  not  include 
either  formal  traffic  descriptors  or  QoS  parameters.  In  addition  to  these  application 
service  requirements,  ITU-T  Recommendation  1.362"  provides  example  services  that  the 
AAL  may  provide  to  enhance  the  ATM  layer  including  [30]:  handling  transmission 
errors,  segmentation  and  reassembly  to  map  user  data  to  the  48-octet  information  field  in 
ATM  cells,  handling  lost  and  misinserted  cells,  and  flow  and  timing  control. 

To  distinguish  between  data  handling  and  service  dependent  functionality,  the 
AAL  is  divided  into  two  sublayers.  The  convergence  sublayer  (CS)  provides  service- 
dependent  functions  and  a  service  access  point  (SAP)  for  applications.  Functionality 
within  the  CS  is  further  differentiated  into  the  service  specific  CS  (SSCS)  and  the 
common  part  CS  (CPCS).  Discussion  here  focuses  on  the  CS  as  a  composite  entity.  The 


'  Two  other  classes,  X  and  Y,  considered  for  a  raw  cell  delivery  service  have  been  dropped. 
"  1.362  has  been  superceded  by  the  ITU-T  F.600  and  F.700  Series  recommendations. 
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SSCS  and  CPCS  are  individually  addressed  only  when  required.  The  segmentation  and 
reassembly  (SAR)  sublayer  segments  user  data  to  fit  within  the  48-octet  length  of  the 
ATM  cell  information  field  and  reassembles  user  data  correctly  at  the  destination. 

Segmentation  is  shown  in  Figure  II. 6.  The  higher  layer  delivers  a  protocol  data 
unit  to  the  CS  sublayer.  The  CS  sublayer  adds  either  a  header  or  a  trailer  or  both  and 
pads  the  CS-PDU  as  required.  The  SAR  breaks  up  the  CS-PDU,  optionally  adds  a  header 
and/or  a  trailer  to  each  segment  such  that  the  resulting  SAR-PDU  is  48  octets  in  length. 
The  SAR-PDU  then  fits  within  a  single  ATM  cell  for  transmission.  At  the  receiver,  each 
of  these  steps  is  simply  reversed. 
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Figure  II.6:  Segmentation  at  the  AAL  [18]. 

The  ITU-T  originally  proposed  five  AAL  protocols  [31],  Types  1  to  5,  but  later 
combined  Types  3  and  4.  The  relationship  between  the  generic  service  classes  proposed 
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by  1.161  and  the  AAL  protocols  is  shown  in  Table  n.2;  the  protocols  do  not  necessarily 
map  to  individual  service  classes. 


Class  A                 Class  B 

Class  C                 Class  D 

Timing  Relation 
Required 

Required 

Not  Required 

Bit  Rate 

Constant 

Variable 

Connection  Mode 

Connection  Oriented                              Connectionless 

AAL  Protocol 

Type  1                   Type  2 

Type  3/4 

Type  5 

Table  II.2:  AAL  Protocol  Mapping  to  Service  Classes  [18]. 

The  AAL  protocols  in  Table  112  map  in  an  interesting  manner  to  the  ATM  layer 
service  classes  shown  in  Table  II.  1.  The  most  widely  used  protocols  are  AALl  and 
AAL5.  AALl  is  for  connection  oriented  CBR  traffic,  matching  the  ATM  layer  CBR 
service.  AAL5  is  also  connection  oriented  but  supports  VBR  traffic.  AAL5  assumes 
higher  layers  perform  connection  management  and  that  the  ATM  layer  produces  minimal 
errors.  As  a  result.  AAL5  has  low  processing  and  transmission  overhead  and  adapts  well 
to  existing  transport  protocols,  such  as  TCP.  These  features  make  AAL5  the  most 
versatile  AAL  protocol,  and  AAL5  is  used  with  all  of  the  non-real-time  ATM  layer 
services. 

The  remaining  ATM  layer  service  is  rt-VBR.  AALl  is  not  appropriate  for  rt- 
VBR.  For  reasons  stated  above,  AAL5  is  the  simplest  protocol  for  transmitting  video. 
AAL3/4  provides  better  support  for  streaming  data  with  low  delay.  However,  AAL3/4 
integrates  poorly  with  most  processor  architectures  [32],  is  more  complicated  than  AAL5, 
and  demands  more  processing  and  increased  overhead.  For  this  reason,  AAL3/4  seems 
relegated  to  specialized  applications  and  has  been  replaced  by  AAL5.  AAL2  appears  the 
most  appropriate  choice,  but  delays  in  developing  the  specification  have  slowed  its 
employment.  Choosing  the  correct  protocol  depends  on  the  specific  application  and  a 
reasonable  expectation  of  vendor  support.  A  more  complete  description  of  each  protocol 


30 


is  available  in  [18]  except  for  AAL2,  which  is  covered  by  ITU-T  Recommendation 
1.363.2  [33]. 

6.  ATM  Multicast 

Based  on  end-to-end  connectivity,  multimedia  applications  fall  into  three 
categories:  point-to-point,  point-to-multipoint,  and  multipoint-to-multipoint.  Multimedia 
applications  such  as  videophone  or  Internet  telephony  fall  into  the  point-to-point 
category.  Video  on  demand  or  remote  broadcasting  falls  into  the  point-to-multipoint 
category.  Finally,  video  conferencing  falls  into  the  multipoint-to-multipoint.  The  latter 
categories  present  a  great  challenge  due  to  the  need  to  efficiently  switch  video  streams  to 
avoid  network  loading  and  the  additional  delay  added  by  cell  duplication  or  readdressing 
[34].  The  approach  taken  in  ATM  is  somewhat  different  from  multicast  IP  due  to  ATM's 
virtual  circuit  structure. 

The  ATM  UNI  3.1  standard  [23]  specifies  both  point-to-point  connections  and 
point-to-multipoint  connections.  The  motivation  behind  a  point-to-multipoint  connection 
is  to  conserve  bandwidth  by  minimizing  the  number  of  VCIs  required  within  the  NNI. 
For  example,  if  an  end-user  wishes  to  transmit  to  A^  other  users,  separate  point-to-point 
connections  would  require  A^  separate  VCIs,  each  with  the  same  bandwidth  requirements. 
A  point-to-multipoint  connection  allows  VCIs  to  be  consolidated  within  the  NNI  when 
they  have  common  end-points.  A  point-to-multipoint  VCC  has  the  following  properties. 
First,  the  multicast  group  resembles  a  tree  with  the  sender  as  the  root  node  and  the 
receivers  as  leaf  nodes.  Second,  the  connection  between  the  root  and  the  leaves  is 
defined  by  a  single  VPI/VCI  at  the  UNI.  Cells  transmitted  by  the  root  are  received  by  all 
of  the  leaves,  assuming  no  losses  in  transmission.  No  bandwidth  is  allocated  for 
transmission  from  the  leaves  to  the  root;  the  connection  is  one-way.  A  one-way 
connection  is  required  since  the  root  node  has  no  mechanism  for  filtering  data  from  each 
leaf  over  a  single  VCI"^.  Under  UNI  3. 1 ,  a  point-to-multipoint  connection  is  set  up  as  a 
point-to-point  connection  between  the  sender  and  the  first  leaf  node.  The  root  node  then 


This  is  possible  using  AAL3/4  but  does  not  appear  to  be  practical. 
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adds  additional  leaves  until  the  multicast  group  is  complete.  Leaf  nodes  may  be  dropped, 
either  by  their  own  request  or  by  the  root  node,  but  leaves  may  not  add  themselves  to  the 
circuit.  A  point-to-multipoint  multicast  scenario  is  shown  in  Figure  11.7. 


Desktop  System 


Desktop  System 


Figure  II.7:  ATM  Point-to-multipoint  Multicast. 

UNI  3.1  does  not  provide  a  specification  for  a  multipoint-to-multipoint 
connection.  A  multipoint-to-multipoint  VCC  has  properties  similar  to  the  point-to- 
multipoint  with  an  important  difference.  The  connection  is  defined  by  a  single  VPI/VCI 
at  the  UNI.  All  cells  transmitted  by  one  endpoint  of  the  connection  are  delivered  to  all 
other  endpoints  and  the  endpoint  is  capable  of  receiving  cells  over  the  same  VCC  from 
any  of  the  other  connected  endpoints.  This  duplex  transmission  leads  to  several 
difficulties  [35].  First,  data  cells  from  different  sources  arrive  at  the  endpoint  interleaved 
and  must  be  properly  reassembled  by  the  AAL.  AALl  and  AAL5  do  not  provide  this 
capability  [31].  AAL3/4  has  a  multiplexing  identifier  (MID)  field  that  allows 
multiplexing  within  a  VCI,  but  there  is  no  standard  for  assigning  MID  values.  The  small 
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size  of  the  MID  field  restricts  multicast  group  size,  and  AAL3/4  requires  a  great  deal  of 
overhead  [18] [32].  The  second  problem  is  resource  management.  A  VCC  is  granted 
only  if  sufficient  network  resources  exist  over  the  transmission  path.  With  a  multipoint- 
to-multipoint  connection,  the  VCC  is  shared  by  a  number  of  sources  and  determining  the 
bandwidth  requirements  is  difficult. 

Various  proposals  have  been  made  to  implement  multipoint-to-multipoint 
connections  within  ATM.  The  simplest  method  for  implementing  a  multipoint-to- 
multipoint  connection  is  a  "forest  of  trees,"  that  is,  using  a  point-to-multipoint  connection 
per  endpoint  [32][36].  With  N  endpoints,  every  endpoint  is  the  root  of  a  point-to- 
multipoint  connection  with  A^-  7  leaves.    A  "forest  of  trees"  offers  low  latency  per 
network  node,  but  a  member  entering  or  exiting  from  the  multicast  group  causes  a  burst 
of  signal  messages.  This  approach  is  specified  by  the  ITU-T  H.3XX  multimedia 
conferencing  standards.  Another  approach  is  to  use  a  server  as  an  intermediary  [37]. 
Each  endpoint  transmits  data  over  a  point-to-point  connection  with  the  server.  The  server 
relays  the  data  to  the  other  endpoints  through  a  point-to-multipoint  connection  for  which 
it  is  the  root  node.  The  Shared  Many-to-many  ATM  ReservaTions  Protocol  (SMART) 
[35]  is  a  novel  ATM  layer  level  protocol  that  regulates  access  to  the  multicast  tree. 
SMART  requires  only  one  VCC  for  the  entire  multicast  group  although  more  VCCs  are 
allowed  to  support  concurrent  data  transfer  by  two  or  more  endpoints.  Access  to  the 
shared  VCC  is  provided  by  a  grant  mechanism  implemented  in  a  round-robin  fashion. 
The  SMART  protocol  has  proven  viable  for  multicast  VTC  traffic  with  suitable 
modifications  to  the  grant  mechanism  to  account  for  the  needs  of  real-time  traffic  [38]. 

C.         WIRELESS  NETWORKS 

The  typical  military  wireless  network  is  based  on  packet-radio  technology  that 
extends  the  concept  of  the  point-to-point  packet-switched  network  to  a  broadcast  radio 
medium.  Like  some  LAN  standards,  such  as  Ethernet,  the  radio  channel  is  inherently  a 
multiple-access  medium  that  provides  a  much  less  reliable  transfer  medium  than  that 
experienced  in  wireline  networks.  As  shown  in  Figure  n.8,  the  data  link  control  (DLC) 
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layer  provides  service  to  higher  layer  protocols,  such  as  BP  and  ATM,  by  transferring  data 
in  packets  or  cells  over  the  radio  medium.  The  DLC  specifically  provides  reliable 
transfer  of  information  across  the  physical  link  and  regulates  access  to  the  shared 
medium.  The  functionality  of  the  DLC  is  separated  into  the  logical  link  control  (LLC) 
and  medium  access  control  (MAC)  sublayers.  While  the  functionality  of  the  layers 
shown  in  Figure  II. 8  is  described  briefly  below,  a  more  thorough  discussion  of  packet- 
based  radio  networks  can  be  found  in  [39]. 


Upper  level 
Protocols 
(ATM,  IP) 

DLC 

LLC 

MAC 

Physical 

Figure  IL8:  DLC  for  a  Packet-Based  Radio  Network. 

1.  Logical  Link  Control 

The  LLC  layer  provides  an  interface  to  the  network  layer,  either  IP  or  ATM  for 
example,  and  performs  error  and  flow  control.  Error  control  involves  providing 
mechanisms  for  responding  to  errors  in  transmitted  frames  while  flow  control  regulates 
the  flow  of  frames  to  ensure  the  sender  does  not  overwhelm  the  receiver.  Errors  occur 
due  to  bit  or  burst  errors  during  transit,  which  either  damage  the  frame  or  cause  the  frame 
to  be  unrecognizable.  Error  control  is  usually  provided  by  an  automatic  repeat  request 
(ARQ)  mechanism  that  combines  error  detection  from  the  MAC  with  positive  and 
negative  acknowledgements  and  retransmission  after  timeout.  For  real-time  traffic,  the 
viability  of  the  ARQ  mechanism  depends  on  the  overall  delay  budget,  and  the  LLC  may 
confine  itself  to  dropping  the  corrupt  data  packets.  The  LLC  layer  may  also  attempt  to 
correct  errors  if  forward  error  correction  (EEC)  coding  is  employed.  Another  possibility 
is  to  perform  power  management  at  the  LLC  layer  to  vary  transmission  power  in  response 
to  observed  error  rates. 
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2.  Medium  Access  Control 

The  MAC  governs  access  to  the  transmission  medium,  performs  conflict 
resolution  and  provides  error  detection.  A  MAC  protocol  is  either  centralized,  where  a 
controller  grants  access  to  the  network,  or  decentralized  wherein  all  stations  dynamically 
determine  access.  Various  protocols  are  available  to  control  access  including  round  robin 
or  polling,  reservation,  and  contention.  With  polling  protocols,  each  station  is  given  an 
opportunity  to  transmit  in  turn.  Reservation  schemes  are  more  suitable  for  stream  traffic 
and  divide  access  time  into  slots,  which  allows  stations  to  reserve  slots  when  data  is  ready 
for  transmission.  Contention  schemes  work  well  for  bursty  traffic  where  all  stations 
attempt  to  seize  control  of  the  medium  and  backoff  when  collisions  occur.  Contention 
works  well  only  for  light-loaded  networks.  Of  the  three  schemes,  reservation  provides 
the  greatest  throughput  and  least  delay  for  integrated  wireless  networks.  Slot-based 
reservation  schemes  for  wireless  ATM  networks  and  mobile  IP  networks  have  been 
proposed  by  [39]  and  [40],  respectively. 

Referring  back  to  Figure  II.8,  information  flows  in  the  flowing  manner.  The 
network  layer  passes  cells  or  packets  to  the  LLC.  The  LLC  appends  a  control  header, 
creating  an  LLC-PDU.  The  control  header  provides  the  data  required  for  flow  control 
and  error  control.  The  LLC-PDU  is  passed  to  the  MAC,  which  assembles  a  frame 
containing  one  or  more  LLC-PDUs  along  with  address  and  error  detection  fields.  Once 
access  is  granted  to  the  radio  medium,  the  frame  is  transmitted  in  order  by  the  physical 
layer. 

3.  Physical  Layer 

The  physical  layer  specifies  the  transmission  medium,  signal  encoding, 
synchronization,  and  bit  transmission/reception.  Although  the  MAC  layer  determines 
access  to  the  channel,  a  wideband  radio  channel  may  be  segregated  several  ways  [41]. 
The  simplest  is  time  division  multiple  access  (TDMA)  in  which  a  sender  transmits  during 
a  fixed  time  slot.  The  channel  may  also  be  split  into  several  independent,  smaller 
channels  using  frequency  division  multiple  access  (FDMA)  or  code-division  multiple 
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access  (CDMA)  to  allow  multiple  users  to  transmit  simultaneously.  Finally,  TDMA  may 
be  combined  with  either  FDMA  or  CDMA. 

D.         LAYERED  VTC  OVER  ATM 

1.  ITU-T  Multimedia  Standards 

The  ITU-T  H-series  recommends  several  standards  for  real-time  multimedia 
communications,  each  targeting  a  different  network  architecture.  The  standards  proposed 
for  ATM  networks  are  briefly  reviewed  to  provide  some  motivation  for  the  layered  VTC 
over  ATM  implementations  proposed  in  this  dissertation. 

Each  ITU-T  H-series  multimedia  conferencing  standard  associates  a  set  of  video, 
audio,  multiplex,  and  control  standards  into  a  multimedia  terminal  [42].  Each  terminal 
provides  point-to-point,  real-time  audio  and  video  conferencing  at  various  levels  of 
quality  with  provisions  for  optional  data  transfer.  Data  transfer  possibilities  include 
graphics,  still  images,  and  control  signals  such  as  those  needed  for  remote  camera 
operation.  Extensions  to  the  base  standards  allow  multipoint  operation  and  encryption 
with  appropriate  network  support.  The  ITU-T  standards  have  found  wide  acceptance, 
and  hardware  implementations  are  readily  available  in  PCI  and  compact  PCI  card 
formats.  Two  ITU-T  standards  address  ATM  networks:  H.321  and  H.310. 

H.321  is  a  first  generation  standard  and  adapts  the  earlier  H.320  recommendation 
to  ISDN  networks  [42] [43].  As  expected  from  a  standard  adapted  from  ISDN 
networking,  H.321  allocates  bandwidth  in  increments  of  64  kbps.  The  baseline  video, 
codec  specified  is  H.261,  which  compresses  color  video  at  a  constant  bit  rate  in 
increments  of  64  kbps.  H.261  supports  two  resolutions:  CIF  (352x288  pixels)  and  QCEF 
(176x144  pixels).  Baseline  audio  is  compressed  using  the  0.71 1  log-PCM  codec, 
providing  low-delay,  toll-quality  narrowband  audio  at  64  kbps.  H.321  uses  the  AALl 
protocol  to  support  data  channels  equivalent  to  ISDN  'B'  channels  by  mapping  one  'B' 
channel  per  VCC. 

H.310  is  a  native  standard  for  videoconferencing  over  ATM/B-ISDN  and  includes 
the  earlier  H.321  as  a  subset  [42][44].  Figure  II. 9  gives  a  simplified  functional 
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description  of  a  H.310  terminal  and  associated  standards  for  multiplexing,  call 
establishment,  and  data  transfer.  Taking  advantage  of  the  high  bandwidth  available  in  B- 
ISDN  networks,  H.310  offers  high-quality  video  using  the  MPEG-2  video  codec  and 
high-quality  audio  using  Layer  II  MPEG-1  audio.  To  support  H.321  terminals,  H.261 
video  and  G.71 1  audio  are  also  supported  with  H.263  video,  a  codec  optimized  for  low 
bit  rate  channels  such  as  analog  modems,  as  an  option.  H.310  terminals  support  a  variety 
of  data  rates,  but  all  terminals  are  required  to  support  common  rates  of  6. 144  and  9.216 
Mbps.  Calls  are  established  by  creating  an  initial  VCC  to  set  up  a  control  channel.  This 
control  VCC  uses  the  AAL5  protocol.  Once  two  terminals  have  established  a  set  of 
operating  parameters,  a  second  VCC  is  created  to  carry  multiplexed  audio  and  video. 
Either  the  AALl  or  AAL5  protocol  is  used.  Additional  VCCs  may  be  established  to 
carry  data  traffic. 
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Figure  II.9:  ITU-T  H.310  B-ISDN  Terminal. 

2.  Layered  Video  Considerations 

Compared  to  the  ITU-T  terminal  recommendations,  layered  video  poses  a 
different  set  of  considerations  in  determining  a  feasible  network  interface.  Chief  amonj 
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these  is  the  desire  to  enable  the  ATM  layer  to  discern  which  video  layer  owns  an 
individual  cell.  Associating  layers  with  individual  cells  allows  an  ATM  switch  to  exploit 
the  hierarchical  nature  of  layered  video  through  scheduling  to  actively  control  congestion 
while  maintaining  the  best  possible  end-to-end  video  quality.  Another  benefit  is  offering 
recipients  the  ability  to  subscribe  to  any  number  of  layers  they  initially  choose  as  well  as 
a  means  to  add  or  drop  layers  during  the  session.  This  is  the  core  promise  of  RLM  [45]. 

A  secondary  concern  is  to  allow  the  network  to  identify  logical  elements  within 
the  video  stream,  such  as  the  frame  header  and  group-of-block  (GOB)  boundaries  (see 
Figure  III.  1).  Locating  GOB  boundaries  provides  another  dimension  to  network 
scheduling  by  allowing  the  switch  to  identify  cells  that  will  not  aid  video  reconstruction 
at  the  recipient  due  to  previous  cell  losses  (see  Chapter  VI).  Two  approaches  for 
allowing  identification  of  video  layers  at  the  ATM  layer  are  proposed  here.  The  first  is  to 
assign  each  video  layer  to  a  separate  VCC.  The  second  requires  multiplexing  individual 
layers  over  a  single  VCC.  Each  approach  impacts  the  network  interface  design 
differently:  the  most  appropriate  AAL  protocol,  schemes  for  manipulating  the  ATM  cell 
header,  and  the  manner  in  which  the  multipoint-to-multipoint  connection  is  established. 
GOB  identification  is  considered  only  briefly  here;  more  details  are  provided  in  Chapter 
VI.  No  attempt  is  made  to  provide  a  complete  multimedia  terminal  specification  such  as 
H.3 10.  Instead,  the  goal  is  to  demonstrate  the  feasibility  of  supporting  layered  video 
within  existing  ATM  standards. 

In  addition  to  the  layering  scheme,  the  choice  of  AAL  protocol  depends  on  the 
services  required  by  the  application.  Here,  we  assume  that  the  audio  and  video  streams 
are  not  multiplexed  as  they  are  in  H.3 10.  Segregating  the  streams  allows  different  service 
for  audio  and  video  and  simplifies  network  scheduling  with  respect  to  the  layered  video. 

We  first  consider  the  audio  stream.    The  tactical  scenario  requirements  (see  Table 
I.l)  limit  the  audio  stream  bit  rate  to  8  kbps.  The  G.71 1  and  MPEG-1  Layer  2  codecs  are 
obviously  incompatible  with  the  scenario  requirements.  This  is  not  surprising  since 
H.3 10  targets  B-ISDN.  However,  other  high-quality  narrowband  audio  codecs  are 
available  that  specifically  target  low  bit  rates.  Two  suitable  codecs  specified  in  the  H.324 
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recommendation  for  low-bit-rate  circuit-switched  networks,  such  as  the  PSTN,  are  the 
G. 723.1  and  G.729  codecs.  G. 723.1  transmits  at  either  5.3  or  6.4  kbps  and  offers  near- 
toll-quality  speech  although  codec  delay  is  rather  large  for  VTC  applications  [5].  G.729 
offers  higher  quality  and  lower  coding  delay  for  a  similar  level  of  complexity.  Both 
codecs  offer  silence  detection  to  reduce  bit  rate  by  either  not  transmitting  or  transmitting 
only  background  noise.  Of  the  two,  G.729  appears  the  best  choice  for  the  tactical 
scenario  considered  here.  Given  that  G.729  transmits  at  a  fixed-bit-rate,  the  AALl 
protocol  appears  to  be  best  suited. 

The  question  for  the  video  stream  is  not  which  codec  to  use,  since  a  layered  coder 
is  assumed,  but  the  type  of  rate  control  to  employ.  Three  options  are  possible:  CBR, 
VBR  with  no  constraints,  and  VBR  with  bit-rate  constrained  to  a  predetermined  average. 
Assuming  a  fixed  quantization  scheme  at  the  encoder,  compressed  video  is  naturally 
VBR  since  compression  gain  varies  frame-to-frame.  Bit  rate  constraints  come  at  the  cost 
of  quality  variations  [46].  CBR  tends  to  show  larger  fluctuations  in  visual  quality  relative 
to  VBR  and  may  be  unappealing  at  low  bit  rates.  VBR  with  a  predetermined  mean-bit- 
rate  demonstrates  quality  fluctuations  between  VBR  and  CBR.  As  indicated  above,  VBR 
streams  have  another  advantage  in  that  bandwidth  can  be  conserved  through  statistical 
multiplexing,  a  significant  advantage  in  low  bit  rate  networks.  However,  resource 
allocation  is  simpler  if  the  mean  bit  rate  is  constrained  since  ATM  traffic  descriptors, 
such  as  PCR  and  SCR,  are  easier  to  determine.  For  these  reasons,  the  video  stream  is 
assumed  to  be  VBR  constrained  to  a  predetermined  mean  bit  rate.  Only  the  AALl 
protocol  is  rendered  unsuitable  by  this  assumption  and  choosing  among  the  remaining 
protocols  depends  on  limitations  introduced  by  video  layering  as  discussed  below. 

The  last  issue  to  consider  is  that  of  synchronization  of  the  audio  and  video 
streams.  We  assume  that  if  the  application  is  given  suitable  timing  information  for  each 
stream,  then  it  is  capable  of  synchronizing  playback.  Timing  information  is  either 
provided  to  the  application  by  the  AAL  or  determined  directly  using  time-stamps 
embedded  in  the  application  PDU.  The  former  approach  is  available  only  if  AALl  or 
AAL2  is  used.  The  latter  is  offered  by  encapsulating  application  data  within  a  RTP 
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packet.  Each  RTP  packet  includes  a  32-bit  timestamp  corresponding  to  the  time  when 
the  first  octet  of  data  was  generated.  The  exact  approach  taken  in  this  work  is  outlined 
below. 

3.         Multiple  VCC  Case 

In  the  multiple  VCC  approach,  each  video  layer  is  assigned  a  separate  VCC  and  is 
readily  identified  within  the  network  by  its  VPIA^CI  pair.  For  scheduling  purposes,  a 
switch  needs  to  logically  associate  the  VPIA^CI  pairs  transporting  the  video  layers  from  a 
particular  sender  and  to  establish  a  hierarchy  for  priority  service.  A  simple  means  of 
logically  associating  layers  is  to  assign  one  VPI  per  sender  or  to  negotiate  VPI/VCI  pairs 
in  contiguous  blocks  .  Using  multiple  VCCs  conveys  several  advantages.  Using 
individual  VCCs  allows  a  great  deal  of  flexibility  in  providing  service  on  a  per-layer 
basis.  The  sender  can  negotiate  different  service  and  different  QoS  for  each  individual 
layer,  even  in  the  absence  of  a  dedicated  scheduling  algorithm  for  layered  video. 
Multiple  VCCs  also  simplifies  the  task  of  allowing  end  users  to  subscribe  to  individual 
layers  at  call  setup  and  dynamically  add  or  drop  layers  once  the  VTC  is  in  progress.  A 
penalty  is  paid  due  to  the  large  number  of  connections.  Call  setup  time  is  increased  and 
changes  to  the  multipoint-to-multipoint  connection  incur  a  proportionate  increase  in 
signaling  amongst  the  end-points. 

Service  is  provided  to  each  layer  using  the  AAL5  protocol.  AAL5  offers  the 
lowest  overhead  of  the  VBR  protocols,  eight  octets  per  CS-PDU  and  no  additional 
overhead  in  the  SAR-PDUs.  It  is  also  the  most  appropriate  choice  if  a  higher-level 
protocol,  such  as  RTP,  is  employed. 

Data  transfer  proceeds  as  shown  in  Figure  11.10.  The  video  compressor  relays 
application  PDUs  over  to  the  AAL  after  time-stamping  each  to  facilitate  synchronization 
with  the  audio  layer.  An  application  PDU  consists  of  a  single  GOB,  multiple  GOBs  or  an 
entire  frame.  The  choice  depends  on  the  manner  in  which  frame  elements  are  exploited 
by  the  coder.  In  the  CS  sublayer,  an  eight-octet  trailer  is  appended,  and  the  CS-PDU  is 


■*  Negotiating  VPIs  and/or  VCIs  is  not  supported  in  UNI  3.1  but  is  supported  by  UNI  4.0  [47]. 
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padded  out  to  a  multiple  of  48  octets.  The  trailer  includes  the  CPCS  user-to-user 
indication  field,  which  allows  transparent  transfer  of  user  information  between  end-users 
or  application  layers.  The  user-to-user  indication  field  identifies  the  video  layer  (0  = 
base,  1  =  first  enhancement  layer,  and  so  on),  which  enables  the  end  application  to 
associate  each  incoming  VCC  with  a  layer  and  correctly  reassemble  the  video  stream. 
The  SAR  sublayer  segments  the  CP-SDU  into  48-octet  SAR-PDUs;  no  headers  or  trailers 
are  necessary.  At  the  ATM  layer,  each  SAR-PDU  is  encapsulated  into  an  ATM  cell 
information  field. 


AAL5  SAP 


Application  PDUs 


CS-PDUs 


SAR-PDUs 


ATM  Cells 


►  VPIA^CI(2) 

►  VPIA^CI(l) 


►      VPIA'CI(O) 


Figure  11.10:  Transmitting  Layered  Video  Using  AAL5  and  Multiple  VCCs. 

Since  the  AAL5  SAR  merely  segments  the  CS-PDU,  the  endpoint  CS  sublayer 
cannot  distinguish  between  SAR-PDUs  containing  the  CS-PDU  payload  and  the  SAR- 
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PDU  containing  the  trailer  that  ends  the  CS-PDU.  To  distinguish  between  these  cases, 
the  SDU-type  bit  in  the  payload  type  field  is  used.  At  the  ATM  layer,  a  CS-PDU  consists 
of  zero  or  more  ATM  cells  with  the  SDU-type  bits  set  to  zero  followed  by  an  ATM  cell 
with  the  SDU-type  bit  set  to  one.  The  latter  indicates  the  presence  of  the  CS-PDU  trailer 
and  the  end  of  the  CS-PDU.  This  scheme  also  allows  the  network  to  determine  the 
boundaries  of  the  application  PDU  by  tracking  changes  in  the  SDU-type  bit.  Figure  n.ll 
shows  how  a  GOB,  assuming  that  the  application  PDU  consists  of  a  single  GOB,  is 
located  within  the  ATM  cell  flow.  Therefore,  a  scheduling  algorithm  could  track  the 
SDU-type  bit  to  incorporate  GOB  boundaries  into  scheduling  decisions. 
,  One  GOB 


SDU  =  0 


SDU  =  0 


SDU  =  0 


SDU  =  0 


SDU=  1 


ATM  Cell  Flow 
Figure  11.11:  Use  of  the  SDU  Bit  to  Locate  Application  PDU  Boundaries  with  AAL5. 

Establishing  a  multipoint-to-multipoint  connection  follows  the  procedures 
outlined  under  ATM  multicast  above  with  the  difference  that  a  separate  point-to- 
multipoint  connection  must  be  established  for  each  layer.  The  order  in  which 
connections  are  established  is  potentially  of  importance  if  the  network  possesses  limited 
resources  over  any  path  that  forms  part  of  a  connection.  To  preserve  the  hierarchical 
nature  of  video  layering,  the  first  point-to-multipoint  connection  established  should  be  the 
VCC  associated  with  the  base  layer.  In  turn,  VCCs  associated  with  the  enhancement 
layers  are  established,  one  by  one,  in  order  of  each  layer's  perceptual  importance.  While 
establishing  a  complete  set  of  connections  in  this  manner  entails  a  longer  setup  time  than 
negotiating  each  connection  simultaneously,  a  hierarchical  connection  order  prevents  lack 
of  resources  from  denying  a  connection  to  a  more  perceptually  important  layer  in  favor  of 
a  less  important  layer.  Therefore,  the  network  arbitrates  which  layers  receive  connections 
based  on  the  resources  present  over  all  paths  composing  the  point-to-multipoint 
connection.  If  an  endpoint  workstation  does  not  possess  the  capability  to  decode  all  the 
layers  comprising  the  video  session,  the  workstation  can  refuse  connection  to  unwanted 
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layers.  The  individual  endpoint  should  also  deny  connection  in  the  case  of  an  illegal 
layering  arrangement.  This  may  happen  if  the  network  does  not  permit  a  connection  for  a 
layer  while  a  less  important  layer  is  allowed  to  establish  a  connection  due  to  smaller 
bandwidth  demands. 

4.  Single  VCC  Case 

The  case  for  limiting  the  layered  video  stream  to  a  single  VCC  is  driven  by  the 
desire  to  minimize  the  number  of  active  connections  in  the  multipoint-to-multipoint 
connection.  While  VCIs  are  not  a  scarce  commodity  -  a  single  VPI  can  bundle  as  many 
as  65536  VCIs  with  the  values  0-32  reserved  [23]  -  signaling  and  control  requirements 
increase  with  the  number  of  connections,  which  subsequently  increases  call  setup  time. 
An  alternative  approach  is  to  multiplex  cell  flows  from  each  layer  within  a  single  VCI. 
Multiplexing  flows  over  a  single  VCI  is  only  supported  by  AAL2  and  AAL3/4.  Since 
AAL3/4  has  been  largely  replaced  by  AAL5,  the  problem  of  supporting  a  single  VCC 
rests  on  determining  a  suitable  interface  between  the  application  layer,  the  AAL2 
protocol,  and  the  ATM  layer. 

Unlike  the  other  AAL  protocols,  AAL2  specifies  only  a  CS  sublayer  and  does  not 
utilize  a  SAR  sublayer  [33].  The  CS  sublayer  functionality  is  further  split  into  service- 
specific  (SSCS)  and  common  parts  (CPCS)  sublayers.  The  simplest  SSCS  definition  is 
the  null  SSCS  which  transfers  application  PDUs  directly  to  the  CPCS  sublayer.  Other 
definitions  remain  under  study,  and  a  SSCS  definition  for  layered  video  traffic  is 
proposed  below.  The  CPCS  sublayer  multiplexes  individual  cell  flows  and  provides 
VBR  traffic  support. 

The  following  service  approach  is  proposed  to  adapt  AAL2  for  layered  video. 
Referring  to  Figure  II.  12,  each  layer  is  assigned  a  service  access  point  (SAP)  at  the  AAL 
SSCS  sublayer.  The  application  PDU  consists  of  a  GOB,  a  contiguous  set  of  GOBs,  or  a 
frame  from  a  particular  layer.  The  application  PDU  is  buffered  within  the  SSCS  sublayer 
and  transmitted  in  blocks  to  the  CPCS  sublayer.  Block  size  is  set  at  44  octets  to  increase 
transmission  efficiency.  If  the  application  PDU  length  is  not  a  multiple  of  44  octets,  a 
variable  length  block  is  transmitted  with  length  <  44  octets. 
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Figure  11.12:  Transmitting  Layered  Video  Using  AAL2  and  a  Single  VCC. 

The  CPCS  sublayer  accepts  blocks  from  each  SSCS  SAP  and  appends  a  three 
octet  header  to  form  a  CPCS  packet.  Within  the  header,  the  Channel  Identifier  (CDD) 
uniquely  identifies  the  layer  number.  The  CID  field  is  8  bits  in  length  which,  after 
allowing  for  reserved  values,  permits  identification  of  up  to  248  individual  channels. 
Since  available  channel  numbers  start  at  8,  one  possible  scheme  is  to  start  numbering 
channels  with  CID  =  8  +  layer  number,  where  layer  numbers  start  at  zero  for  the  base 
layer.  The  length  indicator  field  is  set  to  reflect  either  a  fixed  payload  length  of  44  octets 
or  a  smaller,  variable  value  for  the  last  segment  in  the  application  PDU  if  the  application 
PDU  is  not  an  even  multiple  of  44  octets.    The  CPCS  packet  is  then  loaded  into  a  CPCS- 
PDU  with  an  8-bit  start  field  header.  If  the  length  of  the  last  CPCS  packet  is  less  than  47 
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octets,  a  trailer  is  added  to  pad  the  CPCS-PDU  to  48  octets.  The  combined  overhead  of 
the  CPCS  packet  header  and  the  CPCS-PDU  start  field  header  is  exactly  four  octets. 
Therefore,  a  block  size  of  44  octets  at  the  SSCS  sublayer  simplifies  processing  by  the 
AAL  since  each  CPCS  packet  and  associated  CPCS-SDU  is  transported  within  exactly 
one  ATM  cell. 

An  alternate  approach  that  reduces  overhead  is  to  buffer  application-PDUs  at  the 
SSCS  sublayer.  Each  application-PDU  is  segmented  into  44-octet  blocks  as  before  and 
transmitted  to  the  CPCS  sublayer.  If  an  application-PDU  is  not  an  even  multiple  of  44 
octets,  the  leftover  bits  are  retained  at  the  head  of  the  SSCS  buffer.  When  the  next 
application  PDU  is  buffered  and  segmented  at  the  SSCS  sublayer,  data  from  the  last 
application-PDU  is  encapsulated  into  the  first  CPCS  packet.  Although  this  approach 
transmits  data  from  different  application-PDUs  in  the  same  ATM  cell,  overhead  is 
reduced  considerably  since  every  CPCS  packet  is  filled  to  44-octets,  obviating  the  need  to 
ever  pad  the  CPCS-SDU. 

At  the  destination  AAL,  the  CPCS  sublayer  strips  the  SF  header  off  the  CPCS- 
PDU  and  reads  the  CID  field  within  the  CPCS  packet  header  to  route  the  payload 
appropriately  to  the  SSCS  sublayer.  No  specific  functionality  is  envisioned  for  the 
receiver  side  of  the  SSCS  sublayer.  The  SSCS  sublayer  merely  accepts  the  payload  from 
the  CPCS  sublayer  and  forwards  it  to  the  application  layer.  There  is  no  need  to  recreate 
the  application  PDU  since  the  decoder  is  assumed  to  be  capable  of  interpreting  the  raw 
bit  stream. 

The  above  approach  allows  the  cell  flows  of  each  layer  to  be  multiplexed  over  a 
single  VCC.  However,  the  network  is  unable  to  distinguish  between  the  different  flows  if 
the  only  indication  lies  within  the  ATM  cell  information  field.  As  ATM  switches  only 
read  cell  headers,  layer  designation  must  occur  using  fields  within  the  cell  header  as 
shown  in  Figure  II. 4.  By  design,  ATM  cell  headers  are  relatively  small,  incorporating 
only  the  information  required  for  ATM  switches  to  perform  their  switching  and 
congestion  control  functions.  Therefore,  the  sender  has  very  little  flexibility  in  setting 
individual  fields  within  the  header  that  are  not  subject  to  being  overwritten  by  switches. 
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However,  the  SDU-type  bit  and  the  CLP  bit  are  available  to  the  user  [23].  Used  together, 
the  two  bits  allow  indication  of  up  to  four  layers  (although  only  three  layers  are  employed 
here)  as  indicated  in  Table  II. 3.  The  CLP  bits  are  enabled  for  the  lower  priority  layers. 
Setting  the  CLP  bit  does  not  necessarily  indicate  cells  from  enhancement  layers  are 
automatically  dropped  during  periods  of  congestion.  The  user  is  allowed  to  negotiate 
QoS  separately  for  the  cell  flow  consisting  of  cells  with  the  CLP  bit  set  to  zero  and  the 
cell  flow  consisting  of  all  cells  (CLP  =  0/1)  [28].  Setting  the  CLP  and  SDU-type  bits 
requires  extending  AAL2  to  communicate  with  the  ATM  layer  in  a  manner  similar  to  the 
interaction  between  AAL5  and  the  ATM  layer.  A  method  to  accomplish  this  is  to 
transfer  the  CID  field  value  with  the  CPCS-PDU.  The  ATM  layer  uses  the  CID  value  to 
determine  an  index  into  Table  II.3,  index  =  (CID-S),  and  sets  the  CLP  and  SDU  bits 
appropriately. 


Layer  Number  SDU  bit CLP  bit 

0 


0 

0 

1 

1 

2 

0 

3  (not  used) 

1 

0 

1 
1 


Table  IL3:  ATM  Cell-Tagging  Scheme  for  Layered  Video. 

In  the  multiple  VCC  case,  the  SDU-type  bit  is  available  and  enables  the  network 
to  determine  the  application  PDU  boundaries  in  order  to  incorporate  logical  video 
elements  such  as  a  GOB  or  frame  into  scheduling  decisions.  The  cell-tagging  scheme 
presented  in  Table  II.3  does  not  permit  a  similar  approach  at  the  network  level.  An 
alternative  approach  requires  the  AAL  to  segregate  CPCS-PDUs  resulting  from  each 
layer's  application  PDUs.  The  segregated  CPSC-PDUs  are  then  handed  to  the  ATM 
layer  and  transmitted  sequentially.  Since  each  CPCS-PDU  comes  from  the  same  channel, 
an  application  PDU  appears  to  the  network  as  a  contiguous  set  of  cells,  each  with  the 
same  cell-tags.  By  monitoring  changes  in  the  CLP  and  SDU-type  bits,  the  network  can 
identify  application  PDU  boundaries.  This  approach  is  shown  in  Figure  n.13.  While 
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convenient,  concatenating  application  PDUs  within  the  VCC  impacts  scheduling 
performance.  This  issue  is  covered  in  more  detail  in  Chapter  V. 

I  Layer  2  GOB  ■ 


SDU=  1 
CLP  =  0 


SDU  =  0 
CLP=  1 


SDU  =  0 
CLP=  1 


SDU  =  0 
CLP=  1 


SDU  =  0 
CLP  =  0 


ATM  Cell  Flow 
Figure  11.13:  Identifying  Application  PDUs  in  a  Multiplexed  Cell  Flow. 

Setting  up  a  multipoint-to-multipoint  connection  requires  each  sender  to  establish 
separate  point-to-multipoint  connections  for  the  audio  and  video  streams.  Compared  to 
the  multiple  VCC  approach,  creating  and  maintaining  a  VTC  session  with  a  single  VCC 
reduces  signaling  requirements.  However,  using  a  single  VCC  reduces  flexibility  in 
heterogeneous  networks.  When  the  initial  connection  is  established,  the  sender  must 
negotiate  acceptable  QoS  for  the  entire  video  stream.  While  this  appears  to  negate  the 
flexibility  offered  by  transmitting  layers,  the  sender  still  has  the  option  of  negotiating 
QoS  separately  for  the  CLP  =  0  and  CLP  =  0+1  cell  flows.  For  similar  reasons, 
individual  endpoints  cannot  refuse  individual  layers  at  call  setup  and  must  accept  the 
entire  video  stream  or  decline  the  connection.  Still,  it  is  desirable  to  allow  an  endpoint  to 
dynamically  drop  layers,  both  to  ensure  that  the  more  important  layers  arrive  and  to 
reduce  bandwidth  demands  within  the  network  if  no  downstream  nodes  require  certain 
layers.  Chapter  VI  proposes  a  scheme  that  allows  the  network  scheduler  to  effectively 
drop  individual  layers  within  a  VCC  when  no  destination  indicates  an  interest  in  those 
layers. 

This  chapter  examined  architectures  suitable  for  transporting  real-time,  interactive 
multimedia  information  streams.  A  suitable  network  architecture  needs  to  meet  the 
following  requirements:  multicast  support,  QoS  guarantees,  and  real-time  support.  The 
ensuing  discussion  indicated  that  only  ATM  networks  currently  meet  all  three 
requirements.  Given  that  ATM  is  a  viable  networking  architecture,  two  approaches  are 
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presented  to  transmit  layered  video.  The  first  approach  assigns  each  layer  to  a  separate 
VCI  using  AAL5.  This  approach  is  the  most  versatile  in  allowing  network  access  to 
individual  layers;  it  scales  well  and  provides  easy  access  to  GOBs  within  each  layer.  The 
primary  drawback  is  the  increased  signaling  in  a  multicast  scenario  since  each  individual 
connection  represents  the  base  of  a  multicast  tree.  The  second  approach  multiplexes  each 
layer  across  a  single  VCI  using  AAL2.  This  approach  offers  quicker  call  setup  and 
minimizes  signaling  in  multicast  scenarios  but  requires  modification  to  the  CPCS 
sublayer  to  tag  each  cell  with  an  appropriate  identifier  for  each  layer.  On  the  other  hand, 
a  single  VCI  cannot  scale  beyond  four  layers,  and  organizing  the  stream  into  recognizable 
GOBs  is  somewhat  complicated. 
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III.       VIDEO  CODING  TECHNIQUES 

Even  when  considering  the  modest  requirements  outhned  for  the  video 
teleconferencing  scenario  presented  in  Chapter  I,  raw  video  signals  are  very  bandwidth 
intensive.  Consider  an  example  using  the  specifications  listed  Table  1. 1  with  gray-scale 
video  only.  Sending  an  uncompressed  grayscale  video  stream  at  8  bits  per  pixel  requires 
a  bandwidth  of  approximately  2  Mbps;  this  is  not  an  insurmountable  requirement  with  a 
dedicated  wireline  ATM  network  but  clearly  excessive  for  tactical  video 
teleconferencing.  Restricting  the  video  stream  to  an  average  of  64  kbps  requires  a 
compression  gain  of  about  3 1  to  1  or  an  average  bit  allocation  of  0.26  bits  per  pixel  (bpp). 
Transmitting  a  true-color  vide  sequence  over  the  same  channel  would  require  a 
compression  gain  three  times  higher. 

This  chapter  presents  a  basic  discussion  of  hybrid  video  coding  and  includes 
transform  coding,  motion  compensation,  quantization,  and  entropy  encoding.  A  quick 
measure  for  quantifying  distortion  due  to  quantization  is  introduced  as  a  measure  of 
picture  quality.  The  MPEG  and  H.263  video  coding  standards  are  described  and 
examined  for  error  resilience.  Finally,  wavelet-based  image  compression  is  presented  in 
preparation  for  the  layered  video  discussion  in  the  next  chapter. 

A.         VIDEO  COMPRESSION  OVERVIEW 

Video  coding  involves  a  combination  of  removing  perceptually  redundant 
content,  representing  information  efficiently  through  lossless  coding,  and  exploiting 
frame-to-frame  correlation  within  a  video  sequence.  Motion  video  is  typically  low-pass 
in  nature;  the  human  eye  places  greater  relative  weight  on  lower  frequencies  than  higher 
frequencies  [6].  Therefore,  2-D  transform  methods  are  used  to  generate  an  equivalent 
frequency  domain  representation,  a  process  that  is  lossless  and  invertable.  Using  this 
representation,  variances  in  human  perception  are  exploited  by  quantizing  the  resulting 
coefficients  to  different  degrees  of  precision  with  more  precision  granted  to  the  lower 
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frequencies.  Quantization  reduces  the  dynamic  range  of  the  coefficients,  which  results  in 
information  loss  but  enables  the  coefficients  to  be  represented  with  fewer  bits.  Usually, 
the  least  relevant  coefficients  are  zeroed  out  during  quantization,  thus  creating  runs  of 
zeros.  Since  there  is  little  need  to  explicitly  represent  the  zeros,  run-length  coding  is  used 
to  generate  a  more  compact  representation  that  is,  in  turn,  replaced  by  a  more  efficient, 
lossless  variable-length  coding  (VLC).  Taken  collectively,  these  techniques  are  referred 
to  as  spatial  compression  and  form  the  basis  of  image  compression  standards,  such  as 
JPEG. 

A  video  codec  must  compress  a  time-varying  video  sequence  consisting  of  a 
series  of  frames  spaced  at  equal  time  intervals.  The  codec  may  or  may  not  exploit  the 
temporal  dimension  depending  on  the  application  requirements.  The  simplest  approach  is 
to  ignore  any  correlation  between  individual  frames  and  compress  each  frame 
independently  as  if  it  were  a  still  image.  This  approach  is  known  as  intraframe  coding, 
and  the  resulting  compressed  frames  are  referred  to  as  I-frames.  An  example  is  Motion- 
JPEG,  which  uses  JPEG  to  code  individual  frames.  Intraframe  coding  offers  the 
advantage  of  error  resilience  since  decode  errors  are  confined  always  to  the  current 
frame.  However,  compression  gain  is  limited  to  about  0.5  bits/pixel  with  acceptable 
image  quality  [6].  Higher  compression  gains  are  possible,  for  the  same  quality,  by 
exploiting  the  high  degree  of  correlation  that  video  frames  tend  to  exhibit  from  frame-to- 
frame.  Interframe  coding  removes  redundancy  by  only  coding  the  differences  between 
successive  frames.  When  these  differences  arise  due  to  motion,  interframe  coding  yields 
compression  gains  that  vary  in  relation  to  the  degree  and  type  of  motion.  Static  frames 
exhibit  a  high  degree  of  compression  while  rapid  motion  tends  to  degrade  compression 
performance.  The  drawback  to  interframe  coding  is  the  dependence  between  successive 
frames  at  the  decoder.  If  errors  occur  in  the  current  frame,  the  errors  tend  to  propagate 
temporally  between  successive  frames  as  well  as  spatially  within  the  frames.  Of  course, 
if  two  successive  frames  are  not  correlated,  perhaps  due  to  a  scene  change,  interframe 
coding  performs  no  better  -  typically  worse  due  to  additional  overhead  -  than  intraframe 
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coding  [7].  Therefore,  video  codecs,  such  as  H.263  and  MPEG,  incorporate  both  types  of 
coding  for  efficiency  and,  in  some  cases,  to  place  an  upper  bound  on  error  propagation. 

B.         VIDEO  CODING  HIERARCHY 

To  facilitate  different  aspects  of  video  coding  and  decoding,  the  video  stream  is 
organized  into  a  hierarchy  of  logical  elements.  The  organizational  scheme  varies  from 
coder  to  coder,  but  the  most  common  elements  are  presented  below. 

The  basic  display  unit  is  the  picture  or  frame  and  is  comprised  of  rectangular 
array  of  pixels,  which  in  turn  represent  data  structures  indicating  the  color  and  luminosity 
of  each  pixel.  The  dimensions  of  the  array  represent  the  picture  resolution,  given  as 
columns  x  rows,  where  the  codec  of  choice  determines  the  available  resolutions.  A  set 
number  of  contiguous  pictures  are  organized  into  a  group  of  pictures  (GOP).  A  GOP 
usually  influences  compression  gain  and  consists  of  an  intraframe  coded  picture  followed 
by  a  series  of  interframe  coded  pictures. 

Within  a  frame,  pixels  are  organized,  in  order  of  increasing  size,  into  blocks, 
macroblocks,  and  groups  of  macroblocks  (GOB)  or  slices.  A  block  is  an  8x8  array  of 
pixels  and  is  the  basic  element  for  transform  coding  operations,  such  as  the  discrete 
cosine  transform  (DCT).  Motion  compensation  is  applied  at  the  macroblock  (MB)  level, 
a  16x16  array  of  four  blocks,  to  reduce  the  associated  overhead  and  computational 
expense.  A  frame  may  be  viewed  as  being  composed  of  rows  of  macroblocks.  For 
example,  a  frame  with  a  resolution  of  176x144  pixels  contains  nine  rows  of  macroblocks 
with  eleven  macroblocks  per  row.  One  or  more  contiguous  rows  of  macroblocks  are 
termed  a  GOB  or  a  slice  depending  on  the  codec.  GOB  is  the  more  general  term  while 
the  term  slice  is  defined  within  the  MPEG- 1/2  standards  [6].  GOB  headers,  along  with 
the  frame  header,  serve  as  reference  points  that  allow  the  decoder  to  resynchronize  with 
the  incoming  bit  stream  after  decode  errors  caused  by  lost  packets  or  bit  errors.  A 
representation  of  the  hierarchy  superimposed  on  the  compressed  bit  stream  is  shown  in 
Figure  III.l;  the  length  of  each  compressed  frame  varies  due  to  variable  compression 
gains. 
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Figure  III.l:  Organizational  Hierarchy  for  Compressed  Video. 


C. 


INTRAFRAME  CODING 


Intraframe  coding  (or  spatial  compression)  is  essentially  the  same  as  still  image 
compression.  Each  frame  is  compressed  independently  by  removing  redundant 
information  within  that  frame,  balancing  compression  against  image  quality,  and  coding 
the  remaining  information  in  a  more  efficient  manner.  No  attempt  is  made  to  exploit 
temporal  correlation  existing  between  frames.  The  three  steps  comprising  intraframe 
coding  are  shown  in  Figure  III.2  and  are  explained  further  below. 
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Figure  III.2:  Overview  of  the  Steps  Comprising  Intraframe  Coding. 
1.  Transform  Coding 

A  frame  represents  a  sampled  version  of  the  original  scene  at  a  single  instant  in 
time.  Contiguous  regions  of  samples  (or  pixels)  tend  to  be  highly  correlated,  and  in 
practice  compression  through  direct  scalar  quantization  is  inefficient" .  Instead, 
application  of  a  suitable  linear  transform  to  decorrelate  the  samples  gives  a  greater  level 
of  compression  for  a  given  encoder  complexity  [48]. 

A  suitable  transform  increases  compression  efficiency  as  follows.  A  signal  is 
decorrelated  if  application  of  the  transform  results  in  diagonalizing  the  signal's 
autocorrelation  matrix.  Equivalently,  the  resulting  transform  coefficients  are  not 
correlated.  An  optimal  transform  tightly  packs  energy  into  the  smallest  number  of 
coefficients  possible,  a  property  known  as  "energy  packing"  efficiency  [48].  The 
advantage  is  that  if  the  coefficients  are  arranged  in  decreasing  order  of  magnitude, 
retaining  only  the  first  k  out  of  N  coefficients  gives  the  least  distortion  as  measured  by 
MSB.  The  advantage  is  that,  although  the  transform  is  lossless,  a  given  level  of 
quantization  results  in  the  least  distortion  of  the  original  data. 

Another  advantage  of  transforms  is  that  the  new  domain  is  often  more  appropriate 
for  perceptual-based  quantization.  Certain  transform  coefficients  may  hold  greater 
perceptual  relevance.  For  example,  the  human  visual  system  (HVS)  places  the  most 
importance  on  low  frequency  details  in  images  or  video  [6].  This  dependency  may  be 


''  Still,  direct  techniques  are  employed  where  lossless  compression  is  the  primary  concern. 
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exploited  using  frequency-based  transforms  and  then  distributing  quantization  errors  in 
relation  to  the  relative  importance  of  each  coefficient. 

In  theory,  the  discrete-time  Karhunen-Loeve  transform  (KLT)  provides  the 
greatest  energy  packing  efficiency  [49].  However,  the  KLT  is  both  computationally 
intensive  (order  of  A^^)  and  signal  dependent,  thus  requiring  a  separate  eigenvector 
calculation  for  each  transformed  data  block.  These  liabilities  preclude  the  use  of  the  KLT 
in  video  compression.  Instead,  video  coders  use  transforms  that  approximate  the  KLT's 
energy  packing  efficiency  and  possess  more  efficient  algorithms. 

The  most  widely  used  transform  for  image  processing  is  the  two-dimensional 
discrete  cosine  transform  (DCT).  The  DCT  provides  the  closest  energy  packing 
performance  to  the  KLT,  and  numerous  fast  algorithms  are  available,  frequently 
implemented  in  hardware,  that  reduce  the  computational  effort  to  the  order  of  NlogjN  [6]. 
For  example,  a  2-D  DCT  can  be  implemented  with  as  little  as  54  multiplication 
operations  [50]. 

A  frame  is  transformed  by  dividing  its  elements  into  A^xA^  blocks  of  pixels  and 

applying  the  2-D  DCT  to  each  individual  block.  The  typical  block  size  is  8x8.  Larger 

block  sizes  are  possible,  but  the  pixels  tend  to  be  less  correlated,  which  decreases  the 

resulting  compression  gain.  Denoting  the  original  block  as/f/jj  and  the  transformed 

coefficient  block  as  F(u,v),  the  2-D  DCT  is  given  by  [6] 

"'""''' r.-  -x      ({2i+\)nu^      ({2i  +  \)7tv 
fii,j)cos[  '---' 

/  =  0  7  =  0 

where  u  and  v  are  the  horizontal  and  vertical  frequencies,  respectively,  and 


F(„,.)  =  ^C(«)C(v)2;i;/(,,;)cosfi2i±I)2i]cosfi^^ill2^1  (IIM) 

N  Zf^'r/  \      2N      J      V      2N      J 


^    x  =  0 
C{x)  =  \^  '''  (III-2) 

1      otherwise. 


The  inverse  DCT  is  given  by 


2^'^'    ^  _^  _,      ,      ({2i  +  \)7rn 


f(iJ)  =  -J,Y,C(u)C(v)F(u,v)cos 

A*  „=o  v=o 


,       2/V       , 


COS 


i2i  +  \)7n' 


(111-3) 


2A^ 

Transforming  an  8x8  block  of  pixels  results  in  a  block  of  64  coefficients  with  a 
spatial  frequency  distribution  as  shown  in  Figure  III.3.  The  F(0,0)  coefficient  represents 


54 


the  DC  value  while  the  remaining  coefficients  are  termed  AC  coefficients.  Figure  III.4 
indicates  how  images  elements  map  into  the  frequency  domain  via  the  2-D  DCT  [6]. 
Individual  blocks  within  a  frame  tend  to  show  little  variation  from  pixel  to  pixel,  an 
indication  of  low-pass  frequency  content.  Given  this  condition,  the  magnitude  of  the 
DCT  coefficients  is  largest  in  the  region  about  the  DC  coefficient  and  diminishes  with 
increasing  frequency. 

Horizontal  frequency 


o 

c 
o 

3 

n 


DC 


AC 

Coefficients 


Figure  III.3:  Frequency  Interpretation  of  DCT  Coefficients. 
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Figure  III.4:  Structural  Decomposition  of  Image  Elements  [6]. 

The  need  for  data  blocking  in  DCT-based  compression  becomes  a  liability  with 
high  levels  of  .compression.  Compression  tends  to  remove  high-frequency  components, 
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which  leads  to  smoothing  of  the  visual  content  of  each  block  and  creates  "blocking 
artifacts"  that  disturb  the  continuity  of  the  frame.  The  same  effect  also  leads  to  the 
presence  of  "ringing"  artifacts  around  sharp  edges  [3]. 

2.  Scalar  Quantization 

The  DCT  coefficients  are  quantized  to  reduce  precision,  which  allows  each 
coefficient  to  be  represented  with  fewer  bits.  Quantization  may  also  remove  the  least 
significant  coefficients  by  setting  their  value  to  zero.  The  tradeoff  is  added  quantization 
noise,  which  shows  up  as  distortion  within  the  reconstructed  image.  The  most  typical 
quantization  scheme  employed  is  uniform  quantization  wherein  each  coefficient  Fuv  is 
divided  by  the  quantizer  step  size  Quv  and  the  result  rounded  to  the  nearest  integer  as 
follows  [5]: 


^,u.  =  round 


^^  ,yu,v.  (III-4) 

Q 


The  reconstructed  value  is  found  by  multiplying  the  quantized  coefficient  by  the 
quantizer  step  value,  F^^^.  x  Q^^, .  As  Eq.  (III.4)  implies,  the  quantizer  step  value  may  vary 

with  each  DCT  coefficient  as  discussed  below.  In  this  case,  Quv  represents  an  element 
from  an  NxN  quantizer  matrix.  Alternatively,  a  single  value  may  be  used  for  the  entire 
block  for  simplicity.  Although  uniform  quantization  is  widely  used,  the  choice  is  not 
optimal  since  analysis  has  shown  that  individual  coefficients  are  not  distributed 
uniformly  [51].  Other  approaches  have  been  suggested  to  reduce  the  quantization  error, 
such  as  employing  a  separate  Max-Lloyd  quantizer  for  each  coefficient  [52],  but  the  gain 
does  not  appear  to  outweigh  the  computational  effort. 

Since  not  all  coefficients  are  significant,  some  may  be  discarded  prior  to 
quantization  [6].  In  maximum  variance  zonal  sampling,  the  coefficients  are  ordered  by 
the  magnitude  of  their  variance  and  a  fraction  of  the  N~  coefficients  with  the  largest 
variances  are  retained  with  the  remaining  coefficients  set  to  zero.  Threshold  sampling 
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performs  the  same  function  but  retains  coefficients  on  the  basis  of  the  largest  magnitude 
[6]. 

However,  the  most  common  approach  is  to  weight  the  relative  importance  of  each 
coefficient  by  careful  selection  of  quantizer  step  values  Quv  Small  quantizer  step  values 
yield  less  distortion  but  require  more  bits.  Larger  quantizer  step  values  introduce  larger 
distortion  but  tend  to  result  in  more  zeros  and  require  fewer  bits.  Choosing  the  optimal 
step  size  requires  selecting  a  suitable  criterion,  either  through"  a  bit-allocation  approach  or 
human  visual  system  (HVS)  modeling.  In  bit-allocation,  the  magnitude  is  chosen  to 
minimize  distortion  within  a  bit  budget  for  the  block  or  frame.  One  optimal  scheme 
varies  each  quantizer  in  proportion  to  the  variance  of  the  coefficient,  which  yields  the 
same  average  distortion  for  each  coefficient  [48].  However,  bit  allocation  schemes  fail  to 
account  for  human  sensitivity  to  different  spatial  frequencies.  Instead,  most  international 
coding  standards,  such  as  JPEG  and  MPEG,  employ  quantizer  matrices  based  on  HVS 
models.  Using  HVS  models  as  a  reference,  the  quantizer  step  sizes  are  chosen  such  that 
lower  frequency  coefficients  are  quantized  more  finely  while  higher  frequency 
coefficients  are  quantized  more  coarsely  [6].  The  HVS  is  also  more  sensitive  to 
luminance  intensity  than  chrominance,  so  different  quantizer  matrices  are  developed  for 
each. 

A  desirable  feature  in  video  encoders  is  the  inclusion  of  rate  control  for  the 
outgoing  compressed  video  stream  since  each  frame's  compression  gain  depends  on  the 
frame's  contents.  For  example,  the  encoder  may  attempt  to  maintain  a  constant  bit  rate  or 
a  constant  average  bit  rate,  or  to  allow  bit  rate  to  vary  without  constraint.  Control  is 
exercised  by  varying  video  quality  to  achieve  the  desired  bit  rate.  Referring  to  Figure 
III. 2,  only  the  quantizer  introduces  distortion  and  affects  the  reconstructed  quality  of  the 
frame.  Therefore,  rate  control  schemes  use  feedback  to  dynamically  alter  the  distortion 
introduced  at  the  quantizer^.  The  simplest  approach  is  to  apply  a  scaling  factor  to  the 
quantizer  matrix  to  increase  or  decrease  the  magnitude  of  each  element.  However, 


The  intermixture  of  intraframe  and  interframe  coding  also  effects  the  bit  rate  but  is  usually  set  prior  to 
encoding  and  not  varied  dynamically. 


57 


controlling  bit  rate  reduces  the  coder's  freedom  to  control  quality.  CBR  video  displays 
wider  variations  in  visual  quality  compared  to  VBR  video,  which  does  not  constrain  bit 
rate. 

3.  Entropy  Encoding 

The  quantized  coefficients  may  be  represented  in  a  more  efficient  manner  using 
source  or  entropy  coding,  thereby  further  increasing  the  compression  gain.  Video  coders 
use  a  combination  of  run-length  encoding  and  variable  length  coding. 

Run-length  encoding  (RLE)  is  the  simplest  form  of  entropy  coding  and  is 
frequently  employed  in  both  lossless  and  lossy  compression  schemes.  Using  RLE,  a  data 
block  is  parsed  to  locate  sequences  of  repetitive  values.  Each  sequence  is  replaced  by  a 
codeword  consisting  of  a  delimiter  and  the  number  of  times  the  value  is  repeated.  If  the 
data  block  contains  a  great  deal  of  repetitive  information,  a  significant  reduction  in  size  is 
possible.  Following  quantization,  the  coefficient  block  typically  contains  a  large  number 
of  zeros,  especially  amongst  the  high-frequency  coefficients  [6].  As  the  compression 
gain  depends  on  the  length  of  the  sequence,  rearranging  the  coefficient  block  as  a  vector 
in  zig-zag  fashion,  starting  from  the  DC  coefficient  down  to  the  F(8,8)  coefficient,  has 
been  demonstrated  to  increase  the  run-length  of  the  zeros.  Different  codewords  are  used, 
but  the  most  common  scheme  consists  of  the  run-length  of  zeros  followed  by  the  size  or 
magnitude  of  next  non-zero  value.  If  no  non-zero  values  remain,  a  special  end-of-block 
codeword  replaces  the  sequence. 

After  RLE,  the  quantized  coefficient  block  is  represented  by  a  set  of  codewords 
with  each  representing  a  symbol  drawn  from  a  larger  source  alphabet.  Variable-length 
coding  (VLC)  minimizes  the  average  codeword  length  by  assigning  shorter  codewords  to 
the  most  probable  symbols  and  longer  codewords  to  the  least  likely  symbols,  and  each 
codeword  is  uniquely  decipherable.  Huffman  coding  is  the  most  widely  used  entropy- 
encoding  algorithm  and  is  guaranteed  to  produce  a  minimum  average  length,  uniquely 
decipherable  code  [5].  The  Huffman  algorithm  uses  each  symbol's  probability  of 
occurrence  and  builds  a  prefix  code  using  an  optimum  binary-branching  tree.  Since  both 
the  coder  and  the  decoder  need  to  use  the  same  codebook  and  generating  a  Huffman  table 
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is  computationally  expensive,  standard  tables  are  normally  pre-defined  using  data  drawn 
from  test  images.  An  optimal  representation  is  not  guaranteed,  but  encoding  and 
decoding  are  faster  and  the  need  to  transmit  the  VLC  table  is  avoided. 

4.  Quality  of  Reproduced  Video 

Given  that  video  coders  trade  compression  gain  for  image  quality,  quantifying  the 
level  of  distortion  introduced  due  to  coding  is  useful  in  evaluating  different  coding 
schemes.  A  useful  measure  of  image  distortion  D  is  to  calculate  the  mean  square  error 
(MSE)  between  the  original  ( x )  and  reconstructed  ( x )  images  [6]: 

1         N     M 

Using  the  MSE  to  quantify  distortion  D,  the  signal-to-ratio  (SNR)  is  determined  as 

2 

SNR  =  lO\og,^  —  ,  (IIL6) 

where  o^  is  the  input  variance.  The  most  widely  published  measure  of  image  quality  is 
the  peak  signal-to-noise  ratio  given  by  [6] 

pSNR  ^\0\og,^ ,  (III.7) 

where  K  is  the  maximum  peak-to-peak  value  in  the  image,  255  for  the  typical  8-bit 
image.  For  example,  a  typical  peak  SNR  for  a  typical  JPEG  encoded  grayscale  image  is 
28  dB  at  0.5  bits/pixel  [6]. 

Using  MSE  as  a  measure  of  image  quality  does  have  drawbacks.  MSE  does  not 
distinctly  relate  to  perceptual  quality  since  all  errors  are  given  equal  weight.  Two 
compression  techniques  yielding  the  same  MSE  for  an  image  may  deliver  slight 
differences  in  perceptual  quality  [6]. 

D.         INTERFRAME  CODING 

Interframe  coding  exploits  frame  to  frame  correlation  or  temporal  redundancy  to 
deliver  greater  compression  gains  for  a  given  level  of  quality.  The  degree  of  redundancy 
depends  on  the  scene's  motion  content  due  to  either  motion  of  objects  within  the  scene  or 
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scene  movement  caused  by  a  camera  pan.  Static  scenes  with  little  motion  show  a  high 
amount  of  frame -to- frame  redundancy.  For  example,  the  VTC  scenario  considered  in  this 
work  assumes  motion  video  sequences  consisting  of  a  "talking  head,"  i.e,  a  single  speaker 
talking  against  a  static  background.  An  opposite  example  is  a  scene  change,  where 
successive  frames  have  completely  different  content. 

Several  source-coding  techniques  are  employed  to  remove  temporal  redundancy 
including  block  updating,  differential  pulse  code  modulation  (DPCM),  and  motion 
compensation.  Each  technique  is  suitable  for  a  certain  range  of  motion  content. 
Generally,  exploiting  redundancy  as  motion  content  increases  requires  more  complex 
techniques,  which  in  turn  decrease  decoder  robustness.  As  stated  above,  interframe 
coding  offers  the  potential  for  a  lower  bit  rate  for  a  given  level  of  quality.  Conversely, 
interframe  coding  offers  better  quality  for  a  given  bit  rate.  The  relative  gain,  as  compared 
to  intracoding,  for  the  interceding  techniques  presented  here  is  documented  in  [53]  for 
low  and  high  motion  video  sequences. 

1.  Block  Updating 

The  simplest  interframe  coding  approach  is  a  simple  variation  of  intraframe 
coding.  In  low  motion  video  scenes,  such  as  "talking  head"  video,  motion  is  confined  to 
a  small  region  within  the  scene  while  the  background  remains  static.  Block  updating 
conserves  bandwidth  by  coding  and  transmitting  only  those  blocks  that  have  changed 
perceptibly  since  the  last  frame  [54].  Each  block/(/j)  is  compared  to  its  counterpart  in 
the  previous  frame,  and  a  distance  metric  is  calculated.  If  the  distance  is  below  a  certain 
threshold,  no  update  for  that  block  is  transmitted.  Otherwise,  the  block  is  intracoded  as  in 
Figure  III. 2  and  transmitted.  Block  updating  is  sometimes  combined  with  an  aging 
scheme  that  periodically  forces  block  updates,  which  mitigates  hysteresis  problems  and 
guarantees  that  members  joining  a  dynamic  VTC  session  to  receive  the  full  scene  within 
some  set  interval  [45]. 
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2.  Differential  Pulse  Code  Modulation 

Another  approach  suitable  for  low  motion  video  is  DPCM.  DPCM  is  a  first  order 
predictor  that  uses  only  the  most  recent  sample  to  predict  the  next  sample.  Denoting  the 
current  frame  as  k  and  the  reference  frame  as  A:  -  1 ,  DPCM  subtracts  the  reference  block 
f{i.j,k  -  1)  from  the  predicted  block/(/j,^).  The  resulting  error  block  e{ij,k)  represents  the 
prediction  error  between  the  predicted  block  and  the  reference  block.  Although  little 
correlation  is  left  in  the  error  block  on  average  [48],  the  error  block  is  compressed  as 
shown  in  Figure  III.2,  which  results  in  an  approach  known  as  hybrid  video  coding.  If  the 
prediction  error  is  small,  the  dynamic  range  of  the  pixels  is  considerably  reduced, 
possibly  down  to  zero,  and  DCT-based  coding  yields  a  higher  compression  relative  to 
intracoding  the  original  block  since  the  error  block  has  a  predominant  lowpass 
characteristic. 

Open  loop  DCPM  has  the  disadvantage  that  errors  introduced  by  quantization 
tend  to  accumulate  over  time  at  the  decoder.  Adding  a  feedback  loop  to  the  coder 
mitigates  this  problem.  The  predicted  block  is  compared  to  a  reconstructed  version  of  the 
last  frame  maintained  by  the  coder  instead  of  the  actual  frame.  Using  the  decoded  frame 
as  a  reference  compensates  for  quantizer  error  introduced  by  the  coding  process. 

3.  Forward  Motion-Compensated  Prediction 

DPCM  gives  the  best  results  when  a  scene  is  mostly  static.  With  increasing 
motion  content,  the  probability  of  poor  correlation  between  the  predicted  block  and  the 
reference  block  increases.  Past  some  point,  DPCM  actually  yields  inferior  performance 
relative  to  intracoding.  Assume  that  the  predicted  block  contains  a  discrete  object,  such 
as  a  ball.  If  the  ball  does  not  move,  DPCM  gives  good  results  since  the  best  reference 
block  is  at  the  same  coordinate  as  the  predicted  block.  If  the  ball  is  moving,  the  best 
matching  reference  block  is  offset  relative  to  the  predicted  block,  and  DPCM  delivers 
poor  results. 

Motion  compensation  improves  DPCM  by  comparing  the  predicted  block  to  some 
region  within  the  reference  frame  and  finding  a  reference  block  that  best  matches  the 
predicted  block.  The  best  match  is  determined  by  some  criterion  such  as  minimum 
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distance  or  maximum  correlation.  Since  the  search  process  is  computationally  intensive, 
real-time  applications  confine  the  search  only  to  a  small  region  about  the  predicted  block 
while  off-line  coding  may  search  the  entire  reference  frame.  The  resulting  error  block  is 
encoded  as  previously  described  under  DPCM.  Since  the  decoder  needs  the  location  of 
the  reference  block,  a  motion  vector  accompanies  the  encoded  error  block.    The  motion 
vector  represents  the  location  of  the  reference  block  as  an  offset  {x,y)  from  the  predicted 
block.  DPCM  is  a  special  case  of  forward  motion-compensation,  using  a  motion  vector 
of  (0,0). 

Motion  vectors  add  additional  overhead  to  the  encoding  process  with  two 
implications  for  video  coding.  First,  intraframe  coders  apply  motion  compensation  at  the 
macroblock  level  by  associating  four  blocks  with  a  single  motion  vector  to  reduce 
overhead.  Second,  motion  compensation  is  only  employed  when  a  net  gain  in 
compression  is  possible  over  DPCM  or  intracoding  after  taking  the  overhead  due  to  the 
motion  vector  into  account.    Most  coders  use  the  distance  metric  to  determine  the  most 
appropriate  method  for  encoding  each  macroblock,  i.e.,  interceding,  either  with  motion 
compensation  or  DPCM,  or  intracoding. 

4.  Bi-directional  Motion  Compensation 

Forward  motion  compensation  fails  when  no  suitable  reference  exists  in  the 
previous  frame.  Such  a  situation  arises  whenever  a  scene  change  occurs  or  when  motion 
reveals  objects  that  are  concealed  in  the  previous  frame.  Bi-directional  motion 
compensation  improves  coding  in  these  situations  by  selecting  the  best  reference  block 
from  either  the  previous  frame  or  the  subsequent  frame.  As  before,  the  error  block  is 
encoded  and  transmitted  along  with  a  motion  vector  and  a  flag  indicating  which  frame 
serves  as  the  reference.  The  coder  may  also  interpolate  from  the  best  matches  in  each 
reference  frame  although  this  approach  requires  transmission  of  two  motion  vectors. 

The  cost  of  adding  bi-directional  prediction  is  considerable  and  limits  its 
suitability  to  off-line  or  non-real-time  compression.  The  need  to  search  two  reference 


^  The  picture  type  may  further  influence  the  decision  process  as  in  MPEG. 
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frames  doubles  both  computational  expense  and  buffer  requirements.  Also,  the  reliance 
on  past  and  future  frames  requires  that  both  the  coder  and  decoder  delay  compression  of 
the  current  frame  until  the  subsequent  frame  is  available. 

5.  Distance  Metrics 

In  motion  compensation,  distance  metrics  are  used  to  quantify  the  distortion 
between  a  candidate  reference  block  and  the  predicted  block.  The  best  matching 
reference  block  generates  the  least  distortion  and  thus  provides  the  best  match.  Three 
distance  metrics  commonly  employed  are  [6]  [45]:  mean  squared  error  (MSE),  sum  of 
absolute  differences  (SAD),  and  absolute  sum  of  differences  (ASD).  The  corresponding 
mathematical  expressions  are  given  by: 
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where  x^  ^  represents  the  pixel  intensities  within  the  predicted  block  while  x  V  ^^ 

represents  the  pixel  intensities  in  the,  possibly  offset,  reference  block.  The  reference 
block  is  offset  relative  to  the  predicted  block  by  the  motion  vector  {ij). 

Although  several  H.261  video  codec  implementations  employ  MSE  as  a  distance 
measure  [6],  MSE  requires  expensive  multiplication  operations,  which  makes  it  less 
suitable  for  real-time  applications.  SAD  and  ASD  require  the  less  complex  absolute 
value  operator  and  otherwise  require  only^  addition  operations.  SAD  was  incorporated 
into  the  H.263  test  model  [55],  an  approach  probably  adopted  by  commercial 
implementations.  ASD  has  found  use  in  block  updating  since  taking  the  absolute  value 
after  the  summation  reduces  the  impact  of  noise  introduced  during  video  capture,  thereby 
reducing  spurious  background  updates  in  low  motion  video  [45]. 
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6.  Hybrid  Video  Coding 

Hybrid  video  coding  combines  motion  compensation  with  the  DCT-based  coder 
shown  in  Figure  III. 2.  A  functional  block  diagram  of  a  hybrid  coder  is  shown  in  Figure 
III. 5.  Similar  to  intracoding,  the  current  frame  is  broken  into  a  sequence  of  macroblocks, 
and  a  separate  coding  decision  is  made  for  each  macroblock.  The  motion  estimation 
block  compares  each  macroblock  to  the  reference  frame(s)  and  decides  whether 
intracoding  or  intercoding  is  more  approprate.  For  example,  Telenor's  H.263  test  model 
[55]  employs  a  SAD-based  coding  decision  algorithm.  If  intracoding  is  indicated,  DCT- 
based  compression  is  applied  to  each  individual  block  within  the  macroblock.  If 
intercoding  is  selected,  the  reference  macroblock  is  subtracted  from  the  predicted 
macroblock,  and  the  error  block  is  encoded.  The  motion  vector  is  encoded  separately 
using  a  VLC  although  motion  vectors  are  optional  for  simple  DPCM. 

Figure  III. 5  also  illustrates  the  feedback  path  used  to  prevent  the  accumulation  of 
quantization  errors  at  the  decoder.  After  each  macroblock  is  quantized,  the  quantization 
and  transform  operations  are  reversed,  and  the  results  are  used  to  update  the  reference 
frame.  Not  shown  is  the  controller  functio  lality.  The  controller  implements  either  open- 
loop  or  closed  loop  rate-control,  in  coordination  with  the  network,  by  controlling 
distortion  introduced  in  the  quantizer  and  by  controlling  encoding  decisions  available  to 
the  motion  estimation  block. 
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E.         ERROR  ROBUSTNESS 

Transmission  errors  are  an  inevitable  part  of  any  communication  network  and 
occur  both  within  the  channel  and  within  the  network.  Communication  channels  are 
characterized  by  bit  error  rate  (BER),  typically  10"^  for  fiber-optic  systems  and 
considerably  more  for  copper-based  wireline  and  wireless  systems.  Random  bit  errors  or 
burst  errors  due  to  channel  noise  may  corrupt  either  the  payload  or  the  packet  header. 
Packet  header  errors  are  the  more  serious  of  the  two,  raising  the  potential  for  misrouted 
packets  or  preventing  the  network  from  identifying  the  packet.  Losses  may  be  mitigated 
with  forward  error  correction  and  retransmissions,  but  the  latter  approach  is  untenable 
with  real-time  traffic.  ATM  networks  only  check  for  errors  in  the  cell  header  and  are 
able  to  correct  single-bit  errors  [18].  If  multiple  bit  errors  are  detected,  the  cell  is 
discarded.  The  AAL  layer  at  the  receiver  may  handle  payload  bit  errors  or  leave  error 
handling  to  higher  layers.  Network  losses  occur  due  to  buffer  overruns  at  network  nodes 
during  periods  of  congestion  or  when  the  arriving  aggregate  traffic  prevents  the  switch 
from  servicing  each  connection  to  its  required  QoS.  Although  network  architectures, 
such  as  ATM,  allow  a  call  to  specify  cell  loss  probability  prior  to  call  acceptance,  cell 
losses  do  occur,  especially  if  the  transmission  path  employs  a  wireless  interface.  The 
impact  of  transmission  errors  depends  of  the  error  resilience  of  the  codec. 

Each  cell  loss  or  bit  error  degrades  the  quality  of  the  reconstructed  video  stream 
through  two  mechanisms  depending  on  the  type  of  video  coding  employed.  Assume  that 
a  transmission  error  occurs  such  that  a  single  macroblock  is  decoded  incorrectly.  The 
immediate  impact  is  spatial  corruption  within  the  current  frame  [7].  Since  the  error 
disrupts  the  decoder's  synchronization  with  the  bit  stream,  the  corruption  spreads 
spatially  in  scanline  fashion  until  the  decoder  locates  a  valid  symbol  for 
resynchronization.  Therefore,  the  visual  corruption  usually  spreads  through  the 
remainder  of  the  parent  GOB  or  to  the  end  of  the  frame. 

With  intraframe  coding,  spatial  errors  do  not  persist  beyond  the  affected  frame 
since  each  frame  is  coded  independently.  Interframe  coding,  while  giving  greater 
compression  gains,  increases  the  impact  of  spatial  errors  by  providing  a  propagation  path 
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through  subsequent  frames.  Again,  consider  the  presence  of  one  or  more  corrupted 
blocks  in  the  last  decoded  frame.  In  interframe  coding,  the  last  decoded  frame  serves  as  a 
reference  for  predictive  coding.  Any  error  block  received  in  the  current  frame  that 
references  a  corrupted  block  yields  another  corrupted  block.  Therefore,  spatial 
corruption  propagates  temporally.  With  motion  compensation  enabled,  scene  motion 
carries  decoding  errors  spatially  through  the  scene.  This  is  particularly  distracting  since 
the  human  eye  tends  to  follow  motion  [7].  Duration  of  temporal  errors  is  dictated  by  the 
rate  at  which  intracoded  macroblocks  are  transmitted,  which  is  in  turn  dictated  by  the 
codec.  Factors  impacting  the  relative  error  resilience  of  several  popular  codecs  are 
presented  below. 

1.  Motion  JPEG 

Motion  JPEG  treats  the  video  stream  as  a  sequence  of  still  images,  compressing 
each  frame  using  JPEG.  Since  each  frame  is  encoded  independently,  decoding  errors  are 
limited  to  the  duration  of  the  affected  frame. 

2.  MPEG 

MPEG-1  and  MPEG-2  are  designed  to  deliver  high-quality  audio-video 
compression  for  applications,  such  as  CD-ROM  multimedia,  broadcast  digital  video,  and 
high  definition  TV.  MPEG  employs  the  GOP  format  shown  in  Figure  III.l  to  provide  a 
tradeoff  between  compression  gain  and  random  access  within  the  video  stream  [6].  A 
GOP  includes  three  picture  types;  each  picture  type  limits  the  allowable  macroblock 
types.  I-  and  B-pictures  are  anchor  pictures  and  serve  as  reference  frames.  I-pictures 
allow  only  intracoded  macroblocks.  P-pictures  allow  intracoding  and  forward  motion 
prediction  from  the  last  anchor  picture.  B-pictures  allow  intracoding,  bi-directional 
motion  prediction,  and  interpolation  and  use  the  last  and  next  anchor  frames  as 
references.  Although  not  specified  by  the  MPEG  standard,  an  A^-  picture  GOP  normally 
starts  with  an  I-picture  followed  P-pictures  every  M  frames.  The  remaining  frames  are 
encoded  as  B-pictures  as  shown  in  Figure  III. 6.  A  greater  value  of  A^  offers  greater 
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compression  gain  at  the  expense  of  random  access  since  the  decoder  must  start  at  an  I- 
picture. 
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Figure  III.6:  Typical  GOP,  N  =  9,  M  =  3. 

If  an  error  occurs  in  any  anchor  picture,  errors  may  propagate  through  the 
remaining  pictures  in  the  GOP  until  the  next  I-picture  is  received.  An  I-picture  decode 
error  is  the  worst  case  and  results  in  the  longest  propagation  cycle.  Since  MPEG  employs 
motion  compensation,  decoding  errors  prDpagate  spatially  as  well  as  temporally  and  have 
been  observed  to  grow  and  shrink  depending  on  motion  within  the  frame. 

3.  H.263 

The  ITU  standard  H.263  defines  a  low-bit-rate  video  codec  for  video  transmission 
over  the  PTSN  using  V.34  modems.  H.263  is  optimized  for  bit-rates  of  28.8  kbps  and 
less  and  offers  quality  superior  to  MPEG  at  bit-rates  less  than  64  kbps. 

H.263  employs  the  video  hierarchy  shown  in  Figure  EI.  1  without  the  GOP 
structure.  H.263  coding  resembles  the  concept  of  MPEG  P-pictures.  All  coding 
decisions  are  made  at  the  macroblock  level  and  each  macroblock  is  either  intracoded  or 
intercoded  using  forward  motion  compensation.  To  bound  error  propagation,  the 
standard  specifies  that  a  macroblock  must  be  intracoded  at  least  once  every  132  frames 
[56].  The  lack  of  the  equivalent  of  an  I-picture  to  reset  every  macroblock  at  once,  while 
deliberate,  leaves  H.263  vulnerable  to  prolonged  error  propagation.  Even  with  the 
mandatory  spacing  of  intracoded  blocks,  some  types  of  motion  lead  to  almost  indefinite 
error  propagation  [8]. 

4.  Error  Propagation 

To  place  error  resilience  in  context,  consider  the  worst-case  error  propagation 
using  M-JPEG,  MPEG  and  H.263  compression  under  the  scenario  summarized  in  Table 
1. 1 .  With  M-J,PEG,  an  error  in  one  frame  is  corrected  upon  receipt  of  the  next  frame. 
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The  robust  nature  of  M-JPEG  makes  it  suitable  for  broadband  video  conferencing  [9]. 
Error  propagation  in  MPEG  depends  on  the  GOP  size.  A  typical  reported  GOP  size  is 
twenty  pictures  and,  given  that  an  error  occurs  in  the  I-picture,  the  worst-case 
propagation  is  twenty  frames.  For  an  H.263  coded  stream,  the  worst-case  error 
propagation  depends  on  how  often  individual  macroblocks  are  intracoded.  The  H.263 
standard  specifies  a  maximum  limit  of  132  frames  between  updates  [56].  Assuming  an 
error  occurs  in  an  intracoded  block  and  the  block  is  not  intracoded  again  for  132  frames, 
the  error  could  persist  as  long  as  132  frames  and  possibly  even  longer  given  the  right 
motion  patterns  [8].  Table  III.l  summarizes  the  worst-case  error  duration  for  each  of  the 
three  codecs  for  a  frame  rate  of  10  fps. 


Coding  Scheme 

error 

Worst-case 

propagation 

(seconds) 

JPEG 
H.263 
MPEG 

0.10 

13.20 

2.00 

Table  III.l:  Error  Propagation  in  Popular  Video  Codecs. 

F.         SUBBAND  AND  WAVELET  CODING 

Subband  and  wavelet  coding  are  additional  techniques  for  compressing  still 
images  and  have  been  shown  to  offer  slightly  better  image  quality  than  DCT-based 
schemes  for  similar  levels  of  compression  at  the  cost  of  greater  computational  complexity 
[50].  Subband  and  wavelet  coding  are  fundamentally  similar  in  that  both  decompose  the 
image  into  regions  representing  different  bands  of  spatial  frequencies  present  in  the 
image.  Subband  coders  apply  a  series  of  filters  to  the  image  and  then  decimate  the 
resulting  bands  to  avoid  oversampling  while  wavelet  coders  perform  filtering  and 
decimation  simultaneously  [48].  Of  the  two  methods,  wavelet  techniques  are  more 
common  and  are  examined  further  here. 
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In  contrast  to  the  DCT,  a  discrete  wavelet  transform  (DWT)  filters  and  decimates 
an  image  into  regions  containing  mixtures  of  the  high  and  low  frequency  details  within 
the  image.  Decomposition  is  performed  using  two  analysis  filters.  The  first  extracts  low- 
frequency  content,  the  signal  average,  and  the  other  extracts  high-frequency  content,  the 
signal  details.  Example  analysis  filters  for  a  four-tap  biorthogonal  DWT  are  given  by 
[48]: 

H,{z)=-\  +  3z~'  +3z~'-z-'  (III-ll) 

//,(z)=-l  +  ?z-'-3z-'+z-\  (III-12) 

The  inverse  transform  is  performed  using  the  following  synthesis  filters: 

Go(z)=(l  +  3r'+3z-'  +  z-')/l6  (III-13) 

G,{z)={-\-3z-'  +3z'-  +z-')/\6.  (Ill- 14) 

Image  compression  proceeds  as  shown  in  Figure  III.7.  A  first  order 
decomposition  creates  four  2-D  subbands  from  the  original  image.  Each  subband  results 
from  the  appropriate  application  of  the  analysis  filters  in  the  horizontal  and  vertical 
directions  and  decimation  by  a  factor  of  two.  For  example,  applying  Eq.  (III.  11 )  in  both 
the  horizontal  and  vertical  directions  generates  the  LL  band.  Applying  Eq.  (III.  11)  in  the 
horizontal  direction  and  (III.  12)  in  the  vertical  direction  results  in  the  HL  subband.  The 
remaining  subbands  are  obtained  in  a  similar  manner.  Each  subband  captures  certain 
image  features.  The  LL  subband  retains  the  low-pass  information  within  the  image  and 
displays  a  coarse  representation  of  the  original  image.  Since  most  images  have  a  low- 
pass  characteristic,  most  of  the  image's  energy  is  found  in  the  LL  subband.  High- 
frequency  information  results  from  edges,  which  provide  visual  cues  for  image 
recognition.  The  HL  and  LH  subbands  contain  vertical  and  horizontal  edge  information, 
respectively,  while  the  HH  subband  contains  diagonal. edge  information. 
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Figure  III.7:  DWT-based  Image  Compression. 

The  wavelet  transform  is  invertable  and  lossless  and,  like  the  DCT,  produces  no 
compression  gain.  The  compression  gain  results  from  quantization  and  entropy  coding  of 
the  wavelet  coefficients.  As  with  the  DCT,  the  higher  frequency  coefficients  tend  to  be 
less  significant,  so  most  of  the  compression  gain  is  realized  from  compacting  the  detail 
subbands,  especially  the  HH  subband.  In  the  layered  coder  proposed  by  McCanne  and 
Vetterli,  the  HH  subband  is  discarded  entirely  [45].  Subbands  are  usually  quantized 
independently.  The  LL  band  behaves  much  like  the  original  image  and  can  be 
compressed  using  traditional  transform-based  techniques  such  as  JPEG  [57].  The 
remaining  subbands  are  uniformly  quantized  using  a  stepsize  proportional  to  the  variance 
of  the  coefficients  in  that  subband  [48].  Since  the  higher  subbands  tend  to  have  a  large 
number  of  zeros  following  quantization,  run-length  encoding  and  entropy  encoding 
significantly  increase  compression.  Zig-zag  reordering  provides  no  advantage  in  the 
upper  bands,  so  RLE  occurs  scanline  fashion,  either  horizontally  or  vertically. 
Alternatively,  the  quantized  coefficients  are  grouped  and  vector  Huffman  encoded  [58]. 

Greater  compression  is  possible  by  further  decomposing  the  image.  Figure  III.8 
displays  a  second-order  octave-band  decomposition  obtained  by  applying  the  analysis 
filters  to  the  LL  subband  as  described  above.  A  higher-order  decomposition  is  generated 
by  repeatedly  decomposing  the  lowpass  subband.  The  lowpass  band  is  quantized  using 
transform-based  techniques  while  the  remaining  subbands  are  quantized  as  described 
above.  The  increase  in  the  number  of  bands  allows  quantization  and  encoding  to  be 
further  tailored  to  emphasize  perceptual  details  over  less  perceptible  background  noise. 
Alternatively,  the  interdependencies  among  the  subbands  can  be  exploited  using  zero-tree 
entropy  coding  [59].  Zero-tree  coding  is  analogous  to  zig-zag  scanning  in  DCT-based 
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compression.  The  tree  grows  from  a  single  coefficient  in  each  of  the  low  frequency 
bands  and  gathers  coefficients  in  higher  frequency  bands  that  correspond  to  the  same 
spatial  location  in  the  original  image.  Each  additional  subband  increases  the  size  of  the 
tree  by  a  power  of  four.  Zero-tree  encoding  combines  elegantly  with  bit-allocation  since 
encoding  may  stop  once  the  target  bitrate  is  met.  Conversely,  the  decoder  may  stop  once 
a  desired  level  of  quality  is  achieved. 
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Figure  III. 8:  Octave-band  Decomposition. 

Wavelet-based  compression  schemes  offer  some  advantages  over  DCT-based 
schemes.  The  DCT-based  approaches  achieve  compression  gain  by  removing  high- 
frequency  content  from  the  image  by  zeroing  the  high-frequency  coefficients  during 
quantization.  Wavelet  transforms  separate  the  image  into  regions  of  high  and  low 
frequency  content,  thus  allowing  more  efficient  bit  allocation  since  different  regions  may 
be  quantized  and  coded  differently.  This  is  advantageous  since  the  DWT  coder  has  the 
option  of  preserving  more  or  less  edge  detail  to  improve  perceptual  image  quality  at 
comparable  pSNR  to  the  DCT.  Another  advantage  is  that  wavelet  transforms  are  not 
applied  to  blocks  within  the  image  but  are  instead  applied  to  the  entire  image.  Therefore, 
at  low  pSNR,  while  the  DCT  demonstrates  blocking  artifacts  wavelet  transforms  typically 
display  a  more  visually  pleasing  smoothing  effect.  In  general,  wavelet  transform  coders 
offer  compression  gains,  at  comparative  pSNR,  superior  to  DCT-based  coders.  When 
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comparing  the  state-of-the-art  coders,  wavelet-based  coders  offer  1  dB  improvement  in 
pSNR  over  DCT-based  coders  [50]. 

Several  drawbacks  relative  to  DCT-based  compression  have  limited  the  utility  of 
wavelet-based  video  compression.  Wavelets  achieve  quality  superior  to  DCT-methods 
by  processing  the  entire  image  or  frame.  Motion-compensated  video  coding  exploits 
temporal  correlation  at  the  macroblock  level.  Although  the  error  block  could  be 
transformed  via  a  DWT,  no  significant  advantage  has  been  determined  over  the  DCT,  and 
the  computational  effort  is  greater  [50].  Many  software  and  hardware  "fast" 
implementations  of  the  DCT  require  less  than  one  multiplication  per  coefficient.  Wavelet 
transforms  are  usually  bounded  to  at  least  one  multiplication  per  coefficient.^ 

This  chapter  presented  the  tools  required  for  compressing  motion  video:  transform 
methods,  quantization,  and  entropy  coding.  These  tools  can  be  applied  to  individual 
frames  independently  as  in  intraframe  coding,  or  used  in  conjunction  with  prediction 
schemes  that  capture  frame-to-frame  correlation  as  in  interframe  coding.  An  important 
consideration  is  that  the  choice  of  methods  impacts  both  the  complexity  and  error 
robustness  of  the  coder.  Therefore,  codec  suitability  for  a  particular  application  is  to 
some  degree  dependent  on  the  host  networking  environment.  Wavelet-based  coding 
allows  flexibility  with  frequency  content  selection  to  improve  compression.  The 
frequency  decomposition  offered  by  DWTs  also  provides  a  powerful  tool  for  devising 
more  robust  schemes  for  video  transmission  as  detailed  in  the  next  chapter. 


The  fast  Haar  transform  is  the  exception,  which  requires  no  multiplication  operations  [60]. 
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IV.       LOW-COMPLEXITY  LAYERED  VIDEO  CODING 

Current  coding  standards,  such  as  H.263  and  MPEG,  make  no  explicit  allowance 
for  network  transmission  and  are  severely  degraded  by  both  bit  errors  and  packet  losses 
[7].  Packet  losses  are  preventable  to  some  extent  with  proper  QoS  guarantees,  but  losses 
due  to  congestion  still  occur.  Of  further  concern  is  the  fact  that  tactical  wireless  links 
exhibit  much  higher  BERs  relative  to  wireline  connections.  Putting  aside  the  matter  of 
BER  as  outside  the  control  of  network  applications,  most  approaches  to  reducing  the 
impact  of  congestion  involve  feedback-based  rate-control  schemes  that  change  the 
coder's  quantization,  resolution,  or  frame  rate.  As  discussed  in  Chapter  11,  RTP  provides 
a  framework  for  a  multimedia  application  to  gauge  the  level  of  congestion  within  the 
network  via  receiver  reports  and  vary  its  target  bit  rate  accordingly. 

A  second  drawback  is  the  poor  flexibility  exhibited  by  traditional  video  codecs  in 
multicast  scenarios  when  video  is  transmitted  over  heterogeneous,  packet-based 
networks.  These  codecs  transmit  the  video  signal  as  a  single  stream  of  packets.  The 
combination  of  a  single  video  stream  and  a  heterogeneous  network  suffer  from  many 
limitations  [12].  Consider  the  problem  of  delivering  video  to  a  multicast  group  consisting 
of  several  recipients  connected  over  the  heterogeneous  network  shown  in  Figure  FV.l. 
Examining  the  transmission  paths  leading  from  the  sender  to  the  different  recipients 
reveals  an  obvious  stratification  in  available  bandwidth^.  In  this  scenario,  the  sender 
faces  a  dilemma  when  selecting  an  appropriate  encoder  quality.  Transmitting  high 
quality,  high  bandwidth  video  is  both  acceptable  and  desirable  for  some  recipients. 
However,  low  bandwidth  recipients  will  experience  high  packet  loss  with  a 
commensurate  degradation  in  received  video  quality.  Supporting  the  lowest  common 
denominator  forces  all  recipients  to  view  lower  quality  video,  thereby  underutilizing  high 
bandwidth  links  and  leaving  those  recipients  dissatisfied. 


'  A  similar  heterogeneity  could  exist  in  each  user's  processing  and  display  capabilities. 
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Figure  IV.l:  Video  Transmission  over  a  Heterogeneous  Network  from  [45]. 

This  chapter  addresses  these  concerns  by  considering  a  layered  video  coder  that  is 
more  suitable  for  network  transmission.  The  concept  of  layered  coding,  especially  in  the 
context  of  receiver-based  layered  multicast  (RLM)  and  previous  layered  coder  proposals 
are  examined.  The  chapter's  primary  focus  is  on  a  new  SNR-scalable  layered  coding 
scheme  appropriate  for  tactical  applications  with  emphasis  on  robust  transmission  and 
low  complexity.  Error  robustness  is  provided  by  eschewing  motion  prediction  in  favor  of 
macroblock  updating,  which  significantly  limits  th^  temporal  duration  of  decode  errors 
and  eliminates  any  spatial  migration.  Layering  is  accomplished  via  the  fast  Haar 
transform  (FHT)  with  the  exact  layering  structure  tailored  to  video  content.  The  VTC 
session  is  assumed  to  consist  of  both  low-motion  video,  such  as  a  "talking  head",  and 
static  displays,  such  as  slide  presentations.  Handling  both  types  of  content  with  a  single 
layering  scheme  requires  unacceptable  compromises  since  the  frequency  characteristics 
of  each  are  different.  Therefore,  the  coder  is  optimized  to  handle  each  type  of  content 
separately  by  including  separate  layering  structures  and  custom  VLC  tables.  Finally,  the 
rate  control  problem  is  examined,  and  an  approach  is  proposed  to  reduce  a  A:-dimensional 
rate-control  problem  to  a  simple  1-D  table  lookup. 

A.    BACKGROUND 

Several  approaches  are  available  to  meet  the  diverse  quality  expectations  in  the 
multicast  group.  The  sender  could  encode  the  input  video  as  a  series  of  separate  streams. 
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where  each  stream  targets  a  different  quality  level  and  target  bit  rate.  Each  stream  is  then 
transmitted  to  a  different  multicast  group.  Recipients  then  subscribe  to  that  multicast 
group  offering  the  desired  quality  and  bit  rate.  A  multicast  group  such  as  that  shown 
Figure  IV.  1  would  potentially  require  targeting  three  different  bandwidths.  However, 
separate  encoding  presents  some  liabilities  [45].  Transmitting  several  streams  duplicates 
content  and  requires  far  more  bandwidth.  Encoding  several  streams  simultaneously 
requires  considerably  more  computational  effort  than  a  single  stream  and  limits  this 
approach  primarily  to  non-interactive  video-on-demand  applications.  Another  approach 
is  to  use  transcoding  at  routers  wherein  a  high-quality  video  stream  is  decoded  and  then 
encoded  to  a  lower  quality  for  further  transmission  on  a  lower  bandwidth  network  [45]. 
However,  transcoding  requires  specialized  hardware  in  the  transmission  path,  and  the 
additional  delay  introduced  in  reprocessing  the  video  stream  makes  it  less  suitable  for 
interactive  applications. 

As  discussed  above,  feedback  messages  allow  the  sender  to  estimate  network 
conditions  and  adapt  to  the  onset  of  congestion,  thereby  reducing  the  load  on  the  network 
and  ensuring  that  all  recipients  receive  a  minimal  level  of  quality.  RTF  provides  a 
mechanism  for  receiver  reports  but  leaves  the  actual  mechanism  for  interpreting  reports 
and  making  changes  to  the  application.  Other  schemes  have  been  developed  mainly  for 
use  over  LANs  but  could  be  adapted  for  multicast  applications  hosted  over  an  ATM 
network.  One  scheme  proposed  by  Bolot  and  Turletti  [61]  employs  negative 
acknowledgements  to  indicate  network  state  when  the  number  of  recipients  is  ten  or  less 
and  uses  QoS  messages  sent  periodically  with  sorhe  probability.  Sakatani  [62]  uses 
collisions  detected  at  the  MAC  level  and  round-trip  delay  to  measure  the  effect  of 
congestion.  Once  congestion  has  occurred,  quantization  and  frame  rate  are  dropped  to  a 
"slow  start"  bit  rate.  If  indications  of  congestion  disappear,  the  original  bit  rate  is 
resumed.  Other  schemes  have  been  proposed  by  [63]-[65]. 

However,  heterogeneous  networks  complicate  application  of  feedback-based  rate- 
control  schemes.  In  a  multicast  environment,  each  recipient  in  a  VTC  may  observe 
different  degrees  of  congestion.  The  sender's  task  of  interpreting  the  network  state  and 
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making  appropriate  changes  is  greatly  complicated  when  sender  reports  indicate  that 
congestion  affects  only  a  small  subset  of  the  multicast  group.  Aggressive  response 
lowers  quality  to  the  entire  multicast  group  while  a  more  conservative  response  tacitly 
drops  some  recipients,  at  least  temporarily.  Feedback-based  control  in  general  is 
problematic.  With  high-bandwidth  networks,  rate-control  schemes  may  not  respond  fast 
enough  to  be  beneficial.  In  low-bandwidth  networks,  any  feedback  scheme  consumes 
bandwidth  although  most  attempt  some  form  of  conservation.  For  example,  RTP  scales 
the  receiver  report  rate  to  the  size  of  the  multicast  group.  Still,  the  notion  of  rate  control 
leads  back  to  the  issue  that  selecting  a  single  level  of  video  quality  in  a  heterogeneous 
environment  is  problematic. 

Layered  video  coding,  especially  in  the  framework  of  receiver-based  layered 
multicast  (RLM)  [45],  provides  a  solution  to  the  shortcomings  outlined  above.  A 
layered-video  coder  encodes  the  video  stream  as  a  base  layer  and  a  series  of  enhancement 
layers,  arranged  in  a  hierarchical  fashion.  The  base  layer  provides  a  minimum  acceptable 
level  of  quality  while  the  enhancement  layers  progressively  refine  the  quality  of  the 
received  video  sequence. 

Layered  video  coding  with  RLM  offers  greater  flexibility  in  handling  the  video 
stream  by  moving  bandwidth  management  from  the  sender  to  the  network  and  the 
individual  recipients.  The  sender  generates  a  layered  video  stream  at  the  highest  quality 
(bandwidth)  supported  by  the  network  to  which  it  is  directly  attached.  Each  member  of 
the  multicast  group  then  subscribes  to  some  or  all  of  the  layers.  The  exact  number 
depends  on  available  bandwidth  and  the  video  quality  desired.  If  high  packet  losses  are 
experienced,  the  recipient  drops  layers  until  satisfactory  reception  is  obtained.  Within  the 
network,  the  video  stream  traverses  a  heterogeneous  mixture  of  subnets.  Each  subnet 
carries  the  maximum  number  of  layers  within  the  bandwidth  available,  retaining  the  most 
perceptually  important  layers  and  dropping  the  rest.  Figure  IV. 2  shows  this  approach 
using  the  heterogeneous  network  portrayed  in  Figure  FV.l.  Transmitting  the  video  stream 
as  a  series  of  scalable  layers  maximizes  utilization  of  each  link  and  maximizes  the  video 
quality  available  to  each  recipient. 
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Figure  IV.2:  Video  Transmission  Using  RLM. 

RLM  as  originally  described  by  McCannes  et  al.  [45]  implicitly  provides 
congestion  control  without  feedback  via  recipient  subscriptions.  When  experiencing 
high-packet  loss,  recipients  have  the  option  of  dropping  the  less  important  layers.  As 
layers  are  dropped,  routers  stop  forwarding  their  packets,  thus  preserving  bandwidth  for 
more  perceptually  important  layers.  This  alloA's  more  graceful  degradation  in  video 
quality  in  the  presence  of  both  congestion  and  other  changes  in  network  loading.  The 
sender  does  not  play  an  active  role  in  congestion  control  although  receiver  reports  could 
be  used  to  drop  or  manipulate  the  upper  layers.  RLM  can  be  improved  by  providing  QoS 
guarantees  for  each  layer  and  exploiting  the  hierarchical  nature  of  layered  video  in 
network  scheduling  decisions.  Chapter  II  discussed  methods  for  multicast  transmission 
of  layered  video  with  QoS  guarantees  using  ATM;  scheduling  algorithms  for  layered 
video  are  covered  in  Chapter  VI. 

RLM  also  does  not  explicitly  increase  error  resilience  except  each  subnet  carries 
only  those  layers  capable  of  being  transmitted  without  excessive  packet  losses.  However, 
research  [13]  indicates  that  layered  video  provides  more  error  resilience  than  a  single 
video  stream  of  similar  bandwidth.  Spreading  errors  across  multiple  layers  means  that 
fewer  errors  occur  in  the  base  layer  relative  to  a  single  stream,  and  errors  in  the 
enhancement  streams  are  less  noticeable.  With  ATM  networking,  QoS  can  be  negotiated 
asymmetrically  to  ensure  that  fewer  errors  occur  in  the  most  important  layers. 
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B. 


LAYERED  VIDEO  CODING 


Delivering  layered,  scalable  video  involves  considerations  in  addition  to  those 
covered  in  the  last  chapter  for  traditional  coders.  The  primary  concern  is  effectively 
separating  the  video  stream  into  hierarchical  layers  as  shown  in  Figure  rv.3.  The  video 
stream  consists  of  a  base  layer  that  offers  acceptable  quality  and  a  series  of  enhancement 
layers  that  progressively  improve  quality  in  terms  of  pSNR,  frame  rate,  or  resolution.  An 
effective  layering  scheme  creates  layers  that  provide  gradual  but  perceptible  increases  in 
video  quality.  Transmitting  an  additional  layer  that  does  not  improve  quality  merely 
wastes  bandwidth.  An  effective  layering  scheme  should  also  create  the  layering 
hierarchy  without  significantly  increasing  computational  expense  as  compared  to 
encoding  a  single  stream  and  with  minimul  additional  bitstream  overhead. 
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Figure  IV.3:  Overview  of  Layered  Video  Coding/Decoding. 

Next,  we  consider  some  basic  approaches  for  implementing  the  layering  operation 
implied  in  Figure  IV.3.  Two  avenues  are  considered.  First,  progressive  image 
refinement  schemes,  such  progressive  JPEG  and  pyramid  coding,  easily  extend  to  layered 
coding.  Second,  as  mentioned  in  Section  III.F,  multiresolution  techniques  employing 
subband/wavelet  image  coding  extend  in  a  natural  fashion  to  layered  coding.    Each  of 
these  techniques  is  explored  and  illustrated  with  past  and  current  research  on  layered 
coder  design. 
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1.  Progressive  JPEG  Encoding 

Progressive  encoding  is  one  of  the  four  encoding  modes  defined  in  the  JPEG 
standard  and  represents  an  extension  to  the  baseline  sequential  coder  presented  in  Figure 
III. 2  [66].  Progressive  JPEG  prepares  the  image  for  encoding  in  the  same  manner.  The 
image  is  broken  into  8x8  blocks,  transformed  with  the  2-D  DCT,  and  quantized  using 
either  JPEG  standard  or  customized  tables.  The  difference  lies  in  the  manner  in  which 
the  quantized  DCT  coefficients  are  encoded.  Progressive  coders  segment  the  DCT 
coefficients  and  encode  them  in  multiple  passes  with  each  pass  containing  a  subset  of  the 
frequency  content.  The  goal  is  to  first  transmit  the  most  perceptually  important 
frequency  content  and  then  progressively  improve  quality  with  the  remaining  passes. 
Segmentation  is  performed  via  spectral  selection  or  successive  approximation. 

From  Figure  III.3,  the  DCT  coefficients  are  arranged  from  low  frequency 
components  in  the  upper  left  corner  to  high  frequency  components  in  the  lower  right 
corner.  Spectral  selection  segments  DCT  coefficients  into  spectral  bands  for  encoding, 
where  each  band  includes  a  discrete  set  of  spatial  frequencies.  The  first  spectral  band 
includes  the  DC  coefficient  and  some  number  of  neighboring  AC  coefficients.  Successive 
bands  incorporate  higher  frequency  coefficients  until  all  coefficients  have  been  selected. 
There  are  various  ways  to  select  the  spectral  bands.  One  method  is  to  treat  each  diagonal, 
starting  with  the  DC  coefficient  and  working  right  and  down,  as  a  separate  spectral  band. 
Another  method  is  to  group  coefficients  with  similar  variances,  where  each  coefficient's 
variances  is  calculated  using  representative  test  images  [6]. 

Spectral  selection  tends  to  produce  blocking  artifacts  when  using  only  a  few 
spectral  bands  since  low  frequency  content  is  transmitted  first.  Successive  approximation 
provides  more  visually  pleasing  performance  by  transmitting  a  portion  of  all  non-zero 
DCT  coefficients  in  each  pass  [6].  Each  coefficient  is  essentially  a  binary  value  and, 
within  that  binary  value,  the  most  perceptible  content  is  carried  in  the  most  significant 
bits.  Therefore,  on  the  first  pass,  a  specified  number  of  the  most  significant  bits  for  each 
non-zero  coefficient  are  encoded.  On  successive  passes,  the  less  significant  bits  are 
encoded.  Successive  approximation  yields  a  more  graceful  transition  in  image  quality 
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than  spectral  selection  since  each  pass  includes  some  high  frequency  content.  However, 
successive  approximation  incurs  greater  coder  complexity  compared  to  spectral  selection 

[6]. 

Progressive  JPEG  may  be  viewed  as  providing  a  "preview"  image  and  then 
successively  decreasing  the  distortion  by  transmitting  additional  coefficients.  A  similar 
approach  in  layered  coding  is  to  transmit  a  base  layer  and  then  an  enhancement  that 
mitigates  errors  in  the  base  layers. 

Rhee  and  Gibson  [13]  have  proposed  a  two-layer  coding  scheme  targeting  ISDN, 
enabling  support  for  one  or  both  B  channels  dependent  on  the  available  capacity  (64-128 
kbps).  One  channel  transmits  an  H.261  encoded  base  layer  while  the  other  channel  sends 
an  enhancement  layer  constrained  to  no  more  than  64  kbps.  As  H.261  is  similar  to  the 
H.263  codec  described  in  Section  III.E,  only  the  enhancement  layer  is  covered  here. 

After  encoding  a  frame,  an  H.261  coder  decodes  the  frame  to  serve  as  a  local 
reference  for  motion  compensation  when  encoding  the  next  frame  [67].  Rhee  and 
Gibson's  proposed  coding  scheme  compares  the  original  frame  to  the  decoded  frame  and 
determines  the  MSE  introduced  by  coding  for  each  block.  The  block  errors  are  sorted 
from  highest  to  lowest,  and  the  B  blocks  with  the  highest  error  are  selected  for 
enhancement.  While  the  number  of  blocks  selected  is  fixed  (160  in  the  simulations),  the 
location  of  the  blocks  varies  each  frame  depending  on  scene  content.  After  the  blocks  are 
selected,  b  bits  are  allocated  to  each  block  such  that  Bb  equals  the  desired  bit  rate  per 
frame.  The  bits  are  allocated  to  encode  the  error  at  each  pixel  within  a  selected  block 
based  on  a  bit  allocation  scheme  that  considers  the  observed  error  variance  at  each  pixel 
in  test  video  sequences.  Pixels  demonstrating  larger  error  variances  are  allocated  a 
greater  proportion  of  the  bits;  the  bit  assignment  remains  constant  throughout  the  video 
session. 

Another  proposed  layered  refinement  scheme  based  on  H.261  from  Rhee  and 
Gibson  [68]  uses  the  refinement  layer  to  more  accurately  describe  motion  present  within 
the  frame.  H.261  performs  motion  compensation  at  the  16x16  macroblock  level,  which 
sacrifices  the  more  precise  motion  information  available  using  8x8  blocks  but  is  faster 
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computationally  [67].  The  enhancement  layer  considers  the  displacement  of  the 
individual  blocks  comprising  a  macroblock  and  yields  more  accurate  motion  prediction 
and  better  visual  quality '°. 

The  baseline  H.261  coder  performs  macroblock  level  motion  prediction  by 
comparing  the  current  16x16  macroblock  to  every  macroblock  in  the  previous  frame  and 
selecting  the  best  match.  The  difference  between  the  macroblocks  is  quantized,  encoded, 
and  stored  along  with  the  macroblock  motion  vector.  In  a  parallel  operation,  block-level 
motion  prediction  is  performed  for  the  four  blocks  comprising  the  current  macroblock. 
The  macroblock  motion  vector  is  subtracted  from  each  of  the  individual  block  motion 
vectors,  giving  four  residual  motion  vectors.  The  residual  motion  vectors  are  stored 
along  with  their  respective  encoded  difference  blocks  in  the  refinement  layer.  At  the 
decoder,  both  the  baseline  H.261  and  refinement  streams  are  decoded  simultaneously. 
Within  the  H.261  stream,  the  macroblock  motion  vectors  and  associated  difference 
macroblocks  are  used  to  update  the  current  frame.  If  the  refinement  layer  contains 
information  for  a  particular  macroblock,  the  baseline-decoded  blocks  are  replaced  with 
updated  blocks  using  the  block-level  motion  vectors. 

2.         Pyramid  Coding 

The  pyramid  coding  scheme  proposed  by  Burt  and  Adelson  [69]  extends  well  to  a 
layered  representation  of  still  images  and  has  been  extended  into  the  temporal  domain  for 
video  coding  [48].  Pyramid  coding  employs  a  simple  but  effective  prediction  scheme. 
The  image  is  low-pass  filtered,  decimated  by  a  factor  of  two,  and  then  quantized.  The 
result  is  a  base  image  that  is  a  coarse  representation  of  the  original.  Next,  the  base  image 
is  interpolated  back  to  the  original  image's  resolution,  filtered,  and  subtracted  from  the 
original  image  to  produce  a  prediction  error.  If  the  image  has  a  low  frequency 
characteristic,  usually  a  good  assumption,  the  error  image  is  highly  correlated  and 
compresses  very  well.  The  base  image  is  stored  or  transmitted  using  lossless 
compression  while  the  error  image  is  compressed  using  a  lossy  coder.  At  the  decoder, 


'  H.263  offers  block-level  motion  compensation  as  an  option  [56]. 
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the  error  image  is  added  to  an  interpolated  version  of  the  base  image  to  reconstruct  the 
original  image.  Although  pyramid  coding  is  lossy,  the  error  results  only  from 
quantization  of  the  error  image,  which  may  be  bounded  through  proper  choice  of  the 
quantizer. 

The  previous  description  applies  to  one-step  pyramid  coding.  A  multi-step 
pyramid  is  implemented  by  successively  repeating  the  filtering  and  decimation  operations 
until  the  desired  size  base  image  is  produced;  each  step  reduces  the  size  of  the  image  by  a 
fourth.  For  an  «-step  pyramid,  the  result  is  a  heavily  filtered  base  image  and  a  series  of 
n  - 1  error  images.  The  drawback  to  a  multi-step  pyramid  is  increased  computational 
demand  as  well  as  increased  encoding  delay  and  increased  over-sampling  of  the  image. 

The  CafeMocha  encoder  [70]  uses  pyramid  coding  to  form  two  layers,  and  each 
layer  is  transmitted  to  a  separate  multicast  group  using  two  RTP  sessions.  CafeMocha 
transmits  video  at  a  resolution  of  320x240  with  4  bits/pixel.  The  base  layer  uses  the 
popular  CU-SeeMe  video  coder  [1]  at  a  lower  resolution  of  160x120,  and  the 
enhancement  layer  uses  a  pyramidal  coder  to  improve  the  resolution  to  320x240.  The 
CU-SeeMe  coding  algorithm  uses  block  replenishment  followed  by  lossless  compression. 
A  320x240  frame  is  first  decimated  to  obtain  a  160x120  base  frame.  Each  8x8  block  in 
the  base  frame  is  then  compared  to  its  counterpart  in  the  last  base  frame  and  is  selected 
for  transmission  if  the  difference  exceeds  a  threshold.  The  selected  blocks  are  losslessly 
compressed  and  placed  into  packets  of  no  greater  than  1000  bytes  to  avoid  fragmentation 
along  the  transmission  path. 

Instead  of  forming  an  error  frame,  the  pyramid  coder  generates  error  blocks. 
Each  8x8  block  selected  for  transmission  in  the  base  layer  is  interpolated  to  give  a  16x16 
macroblock.  The  interpolated  macroblock  is  then  subtracted  from  the  corresponding 
macroblock  in  the  320x240  image  to  form  an  error  macroblock.  The  difference  block  is 
losslessly  compressed  using  run-length  coding  and  packetized  as  above.  The  results  in 
[70]  indicate  that  the  addition  of  a  second  layer  improves  visual  quality  compared  to  a 
320x240  CU-SeeMe  video  stream  when  subjected  to  a  50%  packet  loss  rate. 
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Gharavi  and  Partovi  have  proposed  a  multi-grade,  layered  coding  scheme  that 
combines  elements  of  pyramid  and  subband  coding  along  with  DPCM  [71].  Instead  of 
providing  increasing  grades  of  quality  at  a  fixed  resolution,  the  coder  provides  scalable 
resolutions  and  accepts  lower  image  quality  at  higher  resolutions.  Three  layers  are 
employed:  a  base  layer  (LI)  and  two  contribution  layers  (CI  and  C2).  The  different 
resolutions  are  obtained  by  combining  the  appropriate  layers  prior  to  the  decoder  as 
indicated  in  Table  IV.  1 . 


Quality 
Grade 


Resolution 


Layers 
Required 


Qi 
Q2 
Q3 


352x240 
704x480 
1408x960 


LI 

Ll+Cl 

L1+C1+C2 


Table  IV.l:  Resolutions  Supported  in  Gharavi  and  Partovi's  Layered  Coder. 

Video  is  captured  at  the  highest  resolution  (Q3)  and  low-pass  filtered  and 
decimated  to  obtain  the  next  lower  grade  (Q2),  which  is  in  turn  low-pass  filtered  and 
decimated  to  obtain  the  lowest  quality  video  (Ql).  Ql  is  encoded  using  a  hybrid 
DCT/DPCM  scheme  compatible  with  H.261.  The  Q2  and  Q3  video  streams  are  encoded 
separately  but  in  the  same  manner  using  hybrid  subband/DPCM  encoders. 

3.  Wavelet  and  Subband  Coding 

Wavelet  and  subband  coding  provide  a  good  starting  point  for  designing  a  layered 
coder  since  each  image  or  frame  is  resolved  into  a  series  of  subbands  that  follow  a  strict 
hierarchy  [48].  As  discussed  in  Section  III.D,  a  two-level  wavelet  decomposition  of  an 
image  yields  an  average  subband  LL,  representing  the  low  pass  frequency  components  of 
the  image,  and  the  detail  subbands  LH,  HL,  and  HH,  representing  higher  frequency  detail 
in  the  horizontal,  vertical,  and  diagonal  directions,  respectively.  The  following  is  one  of 
several  approaches  to  realize  a  simple  layered  coder  using  a  wavelet  transform: 

•  Compress  each  frame  separately  by  using  the  wavelet  transform. 

•  Quantize  and  entropy  encode  each  subband  separately. 
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•  Form  three  layers  based  on  the  frequency  content:  a  base  layer  (LL  subband), 
a  first  enhancement  layer  (LH  and  HL  subbands),  and  a  second  enhancement 
layer  (HH  subband). 

A  coder  employing  this  approach  is  shown  in  Figure  rv.4.  At  the  receiver,  the 
layers  are  decoded  and  inverse  wavelet  transformed  prior  to  video  display.  If  any  layers 
are  dropped  due  to  bandwidth  (or  possibly  errors),  those  wavelet  coefficients  are  assumed 
to  be  zero  and  the  frame  is  reconstructed  using  the  remaining  detail  subbands. 
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Figure  IV.4:  Basic  Layered  Video  Coder  Using  Wavelets. 

If  more  layers  are  desired,  the  process  can  be  repeated  at  the  coder  by  applying 
the  wavelet  transform  to  the  average  (LL)  subband  to  generate  four  higher  order 
subbands.  Following  the  approach  outlined  above,  the  compressed  video  could  be 
transmitted  using  as  many  as  seven  distinct  layers. 

Bahl  and  Hsu  have  proposed  a  wavelet-based  layered  coder  incorporating  content 
sensitive  spatial  decomposition  and  multiresolution  coding  [72].  Spatial  decomposition 
is  performed  via  a  split-and-merge  algorithm  [73].  A  frame  is  split  into  blocks  of 
identical  size  and  then  adjacent  blocks  of  similar  variance  are  merged  to  generate  regions 
of  common  perceptual  importance.  After  applying  the  algorithm,  the  results  are  saved  as 
a  segmentation  mask  and  reused  for  subsequent  frames.  A  new  segmentation  mask  is 
only  calculated  if  significant  motion  occurs  within  the  frame. 

The  coder  decomposes  each  block  using  the  fast  Haar  transform  (FHT)  and  then 
applies  motion  compensation,  quantization,  and  variable-length  coding  to  each  subband. 
Bit  allocation  is  performed  in  proportion  to  the  variance  exhibited  within  each  subband. 
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Transmission  is  prioritized  by  subband  and  region  and,  optionally,  the  receiver  can 
request  priority  updates  for  regions  corrupted  by  packet  loss  within  the  network. 

McCannes  et  al.  have  performed  the  most  extensive  work  on  the  problem  of 
multi-cast  video  by  proposing  the  RLM  architecture  for  delivering  multi-cast  video  over 
heterogeneous  networks  [12].  In  a  follow-on  work,  the  authors  break  the  multicast  video 
problem  into  two  areas,  the  compression  problem  and  the  transport  problem,  and  propose 
a  comprehensive  solution  for  both  problems  [45].  The  compression  problem  is  met  with 
their  proposed  hybrid  DCT/wavelet  layered  codec.  The  codec  provides  robust  error 
resilience,  low  coder  complexity  for  good  run-time  performance,  and  acceptable 
compression  performance. 

Error  resilience  is  provided  through  macroblock-based  conditional  replenishment 
wherein  only  the  raacroblocks  that  change  in  the  current  frame  are  encoded  for 
transmission.  While  block  replenishment  does  not  offer  the  same  compression  gain 
available  with  motion  compensation,  the  authors  argue  that  the  difference  is  negligible 
compared  to  improved  quality  when  considering  packet  loss. 

After  blocks  are  selected  for  replenishment,  they  are  compressed  spatially  using  a 
hybrid  DCT/wavelet  scheme.  Each  16x16  macroblock  is  decomposed  into  four 
subbands.  The  LL  band  is  created  using  a  1/3/3/1  biorthogonal  wavelet,  and  the 
remaining  subbands  are  created  using  the  discrete  Haar  transform  [48].  The  HH  band 
contributes  little  energy  to  the  reconstructed  frame  and  is  discarded.  The  LL  block  is 
further  transformed  with  a  DCT  and  the  resulting  coefficients  are  progressively  encoded 
using  spectral  selection.  The  remaining  LH/HL  subbands  are  combined  and  are  also 
progressively  encoded  using  embedded  zero-trees. 

Once  all  selected  blocks  within  the  current  frame  are  encoded,  a  spatio-temporal 
hierarchy  is  created  combining  spatial  and  temporal  layering.  Within  each  encoded 
block,  the  progressively  encoded  DCT  and  wavelet  coefficients  are  organized  into  a 
number  of  spatial  layers.  The  possible  combinations  of  bit-rate  between  spatial  and 
temporal  layers  is  a  two-dimensional  region  where  every  trajectory  provides  a 
compromise  between  visual  quality  and  the  rate  of  frame  updates  at  increasing  bit  rates. 
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C.         A  LOW-COMPLEXITY  ADAPTIVE  LAYERED  CODER  DESIGN 

In  this  section,  we  propose  a  new  layered  coder  design.  The  goals  in  proposing  a 
new  coder  are  threefold.  First,  tactical  considerations  limit  transmission  bandwidth  and 
place  a  premium  on  robust  transmission.  These  considerations  determine  the  type  of 
compression  techniques  that  are  desirable  or  even  feasible  in  a  tactical  video  coder. 
Second,  previously  reported  layered  coding  efforts  are  very  diverse  with  emphasis  on 
different  network  architectures  or  applications.  Consensus  on  identifying  a  structured 
approach  to  designing  layered  coders  or  quantifying  those  parameters  that  make  a  layered 
coder  effective  is  lacking.  Third,  a  working  coder  provides  a  source  for  gathering 
statistical  traffic  data  that  is  used  in  later  chapters  to  model  layered  video  traffic  for 
network  simulations  and  to  examine  error  concealment  issues.  A  working 
implementation  of  this  coder  is  provided  by  [74]  and  was  used  to  evolve  the  design. 

The  guidelines  observed  in  designing  the  layered  coder  flow  from  both  the  tactical 
VTC  application  and  the  considerations  for  designing  an  effective  layered  coder.  The 
application  imposes  the  following  requirements.  First,  the  coder  must  adaptively 
optimize  compression  for  both  low  motion  video  and  static  slides.  Second,  the  coder 
must  possess  a  low  complexity  architecture  to  minimize  coding  delays  and  power 
requirements.  Third,  the  coder  must  provide  error  resilient  decoding  at  high  packet  loss 
rates.  Fourth,  the  coder  must  constrain  the  bit  rate  to  a  predetermined  average.  Finally, 
the  coder  must  meet  the  performance  specifications  listed  in  Table  1. 1. 

Implementing  an  effective  coder  within  the  above  constraints  involves  due 
consideration  of  the  following  elements.  First,  the  coder  should  transmit  a  base  layer 
with  acceptable  quality  and  two  (or  more)  enhancement  layers  such  that  each 
progressively  improves  perceptual  quality.  Second,  the  coder  should  minimize  the 
bitstream  overhead  required  to  accommodate  the  layering  structure. 

A  functional  diagram  of  the  proposed  coder  is  shown  in  Figure  IV. 5.  Details  for 
each  component  are  provided  in  subsequent  sections. 
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Figure  IV.5:  Functional  Block  Diagram  of  the  Hybrid  FHT/DCT  Layered  Coder. 

1.         Block  Selection  for  Motion  Compensation 

Given  the  assumption  of  low  activity  video,  temporal  compression  is  provided 
through  a  simple  block  selection  (updating)  scheme  that  encodes  only  those  macroblocks 
that  show  significant  changes  frame-to-frame.  For  low  activity  video,  block  selection 
yields  only  slightly  inferior  compression  performance  relative  to  motion  prediction 
schemes  [53].  Since  interframe  error  propagation  is  greatly  limited  and  intraframe  error 
propagation  is  eliminated,  it  provides  greater  robustness.  Block  updating  also  voids  the 
need  for  a  locally  decoded  reference  frame.  This  greatly  simplifies  the  coder  since  an 
inverse  quantization/transform  loop  is  not  required.  Block  selection  is  considered  here 
solely  with  regard  to  video  sequences.  Static  sequences  exhibit  little  or  no  motion  and 
consequently  make  little  use  of  block  selection.  Indeed,  most  transmissions  that  occur 
during  static  sequences  arise  from  the  considerations  presented  in  the  next  section  that 
require  the  inclusion  of  a  block-aging  algorithm. 

Motion  is  detected  by  applying  a  distance  metric  between  successive  frames.  The 
distance  between  each  macroblock  in  the  current  frame  and  its  counterpart  in  the  previous 
frame  is  calculated  and  the  result  compared  to  a  threshold.  To  decrease  computational 
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expense,  the  distance  metric  is  applied  to  individual  8x8  blocks  within  the  macroblock; 
the  first  block  to  satisfy  the  threshold  triggers  selection  and  ends  the  search,  thus  avoiding 
the  expense  of  examining  the  remaining  blocks.  To  further  decrease  computational 
expense,  distance  calculations  are  confined  only  to  the  luminous  component  of  each  pixel 
even  if  color  components  are  present  since  the  human  visual  acuity  is  more  sensitive  to 
changes  in  luminosity  [6]. 

Since  motion  in  VTC  scenes  tends  to  be  confined  to  discrete  objects  within  the 
scene,  as  opposed  to  scene  motion  caused  by  a  camera  pan,  search  efficiency  is  slightly 
affected  by  the  order  in  which  the  individual  blocks  are  examined.  The  more  efficient 
approach  is  to  maximize  the  distance  between  th^  first  two  blocks  examined.  As  shown 
in  Figure  IV.6.  two  search  patterns  can  be  considered:  a  cross-pattern  search  that 
examines  the  upper  left  block  followed  by  the  lower  right  and  a  clockwise  search  starting 
from  the  upper  left.  In  the  test  video  sequences  examined,  for  those  macroblocks  selected 
due  to  motion,  the  cross-pattern  search  resulted  in  a  2.5%  decrease  in  the  average  number 
of  blocks  examined  per  frame  compared  to  the  clockwise  search.  The  result  was  a  net 
decrease  of  one  block  per  frame.  Of  course,  the  decrease  depends  on  motion  content; 
with  increasing  motion,  the  difference  becomes  negligible. 
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Figure  IV.6:  Block  Search  Order:  a)  Clockwise  Search  and  b)  Cross-pattern  Search. 


A  much  greater  improvement  is  realized  by  using  the  cross-pattern  search  but 
changing  the  starting  block  of  each  macroblock  each  frame  to  match  the  anticipated 
motion  at  that  point  in  the  frame.  Again,  motion  in  VTC  sequences  is  fairly  confined. 
For  example,  a  speaker  shifts  left  to  right  and/or  slightly  up  and  down.  Consequently, 
macroblocks  tend  to  be  selected  in  the  same  manner.  Therefore,  search  speed  is 
increased  by  having  the  coder  store  the  identity  of  the  specific  block,  termed  the  "anchor" 


block,  that  caused  a  particular  macroblock  to  be  selected  in  the  previous  frames.  For 
each  macroblock  in  the  new  frame,  the  block  selection  algorithm  starts  from  the  anchor 
block.  If  the  anchor  block  causes  selection  or  if  the  macroblock  is  not  selected,  the 
anchor  block  identity  is  unchanged.  If  another  block  causes  selection,  the  anchor  block 
identity  is  updated.  Using  this  search  scheme  produced  an  additional  20%  improvement 
in  the  number  of  blocks  searched  and  resulted  in  10  fewer  blocks  searched  per  frame  on 
average.  A  more  complex  approach  not  examined  here  is  to  remember  the  two  blocks 
that  most  frequently  caused  selection  and  tailor  the  search  accordingly.  The  resulting 
tailored  search  would  be  clockwise,  counter-clockwise,  or  cross-pattern. 

The  distance  metric  employed  is  the  non-normalized  ASD  given  by  [6]: 

M      N 


m=\  n=l 


IIk..-^L!  (iv.i) 


where  .x:,,  „,  and  jc^  „  represent  the  pixel  intensities  in  the  predicted  and  reference  blocks, 

respectively.  This  expression  of  ASD  differs  from  the  form  given  by  Eq.  (IE.  10)  in  that 
the  result  is  not  normalized  by  the  number  of  pixels  and  the  reference  macroblock  is  not 
offset.  The  non-normalized  version  is  used  since  the  normalization  factor  is  easily 
included  in  the  threshold  value,  saving  the  cost  of  a  floating  point  division  operation  or, 
at  least,  a  right-shift  operation.  The  ASD  is  employed  due  to  computational  efficiency  as 
it  only  requires  additions  and  subtractions  along  with  a  single  absolute  value  operation. 
SAD  requires  an  equal  number  of  arithmetic  operations  but  requires  MN  - 1  more 
absolute  value  operations.  Further,  since  the  ASD  takes  the  absolute  value  of  only  the 
sum,  it  acts  like  an  accumulator  and  provides  a  low-pass  filtering  effect  that  removes 
noise  added  to  pixel  intensities  through  video  capture.  Smoothing  prevents  spurious 
block  selection  in  otherwise  static  screen  regions  that  could  occur  in  other  metrics,  such 
as  SAD  or  MSB,  where  non-linear  operations  on  a  per-pixel  basis  tend  to  accumulate 
noise  energy.  This  allows  bandwidth  to  be  more  effectively  devoted  to  regions  of  greater 
interest  [45]. 

The  relative  selectivity  of  ASD  and  SAD  was  tested  by  determining  the  relative 
thresholds  required  to  deliver  approximately  the  same  quality,  as  measured  by  pSNR,  and 


89 


then  comparing  the  resulting  block  selection  rates  and  pattern.  Examining  Figure  IV.7,  a 
threshold  index  of  below  8-10  was  required  to  adequately  capture  motion  scene  motion. 
In  this  region,  ASD  selects  1-2  more  macroblocks  compared  to  SAD.  However, 
examining  the  macroblocks  selected  confirmed  that  ASD  tended  to  better  capture  speaker 
motion  while  SAD's  selections  were  more  diffuse.  As  a  result,  not  withstanding  the 
pSNR  equivalence,  video  compressed  using  ASD  was  judged  more  visually  pleasing. 
The  difference  in  bandwidth  appears  negligible  considering  the  vast  decrease  in 
computational  effort  required  by  ASD. 
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Figure  IV.7:  Comparison  of  ASD  and  SAD  for  Block  Selection. 

Two  independent  elements  effect  video  quality  and  thus  required  bit  rate: 
adequate  motion  detection  to  prevent  "jerky"  motion  in  the  reconstructed  video  and 
controlling  distortion  introduced  due  to  quantization.  The  goal  in  motion  detection  is  to 
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select  the  maximum  block  selection  threshold  that  adequately  captures  motion.  In  the 
video  sequences  examined,  a  threshold  of  160  proved  adequate.  At  this  threshold  value, 
an  average  of  24.8  macroblocks  was  selected  per  frame  in  test  video  sequences".  In 
practice,  a  user  selectable  threshold  would  prove  beneficial  by  allowing  the  sender  to 
compromise  between  motion  selection  and  visual  distortion  given  a  set  bit  rate. 

2.  Aging  Algorithm 

Motion  compensation  using  only  block  refreshment  through  the  selection  scheme 
described  above  presents  some  problems  [45].  Consider  an  arbitrary  macroblock  whose 
content  is  changing  due  to  motion  within  the  frame.  The  macroblock  travels  from  its 
initial  state  along  some  trajectory  to  a  final  state  once  the  motion  has  stopped.  At  some 
point  in  the  trajectory,  the  block  selection  algorithm  forces  an  update  to  the  macroblock. 
Once  the  final  state  is  reached,  hysteresis  occurs  if  the  distance  between  the  final  and 
updated  states  is  not  sufficient  to  force  block  selection;  the  distance  differs  by  less  than 
the  threshold.  In  this  case,  the  macroblock  is  not  selected  for  updating,  and  the  displayed 
macroblock  at  the  receiver  is  left  with  a  persistent  error.  Another  problem  occurs  when 
new  participants  are  allowed  to  join  a  VTC  in  progress  (dynamic  multicast)  [45].  Since 
the  coder  is  only  transmitting  those  macroblocks  selected  due  to  motion,  new  participants 
receive  a  portion  of  the  current  scene.  With  low  activity  video,  the  end  result  is  a  patchy 
"disembodied"  speaker.  The  final  problem  is  the  duration  of  error  artifacts  due  to 
missing  or  corrupt  packets  at  the  receiver.  Artifacts  created  in  the  active  portion  of  the 
scene  tend  to  last  for  only  a  single  frame  since  block  updates  occur  frequently.  However, 
errors  in  less  dynamic  regions  tend  to  persist  longer  since  the  frequency  of  updates  is 
correspondingly  lower.  Due  to  lower  motion  content,  each  of  these  problems  is  of  greater 
concern  during  static  sequences  since  the  block  updating  scheme  selects  either  a  few 
macroblocks  to  transmit,  given  an  in-screen  cursor,  or  none  at  all. 

Coupling  the  block  update  scheme  with  an  aging  scheme  that  forces  periodic 
updates  of  each  macroblock  alleviates  these  problems.  The  general  principle  is  that  the 


"  Actually  more  macroblocks  are  selected  due  to  forced  selections  as  covered  in  the  next  section. 
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coder  tracks  the  time  interval  or  age  since  each  macroblock  was  last  selected.  If  a 
macroblock's  age  exceeds  a  predetermined  interval,  that  macroblock  is  selected  by 
default.  Aging  thus  guarantees  a  maximum  period  between  macroblock  updates.  This 
bounds  both  the  duration  of  hysteresis  errors  and  visual  artifacts  caused  by  losses  and 
errors  during  transmission.  The  bound  also  ensures  that  new  viewers  receive  an  entire 
frame  in  a  timely  manner. 

Obviously,  aging  increases  bandwidth  requirements,  but  the  impact  is  lessened  by 
the  manner  in  which  macroblocks  are  selected  through  aging  and  the  length  of  the  aging 
interval.  Spreading  block  selections  evenly  over  time  is  desirable  to  avoid  spikes  in  bit 
rate,  which  in  turn  requires  a  scheme  that  ages  each  block  independently.  Simply 
choosing  to  update  a  block  after  n  frarres  pass  without  an  update  leads  to  an  undesirable 
correlation  in  updates  following  each  scene  change.  Even  though  motion  within  the 
scene  tends  to  randomize  updates  to  some  extent,  a  sufficiently  static  background  would 
still  lead  to  correlation  of  a  significant  fraction  of  block  updates.  The  worst  case  is 
represented  by  a  scene  change  where  the  new  scene  is  entirely  static,  such  as  a  slide 
presentation.  In  this  case,  the  bitrate  spikes  every  n  frames.  Increasing  the  aging  interval 
decreases  bandwidth  but  increases  the  duration  of  visual  errors  and  degrades  response 
time  for  new  participants. 

The  aging  algorithm  used  in  the' coder  does  not  track  the  age  of  each  macroblock 
directly.  Instead,  each  macroblock  has  an  entry  in  an  update  table  identifying  the  number 
of  frames  remaining  until  that  macroblock  must  be  updated.  As  each  frame  passes 
without  an  update,  the  entry  is  decremented  by  one.  As  each  macroblock  is  processed  for 
block  selection  in  a  given  frame,  the  coder  examines  the  macroblock's  entry  in  the  update 
table.  If  its  corresponding  entry  has  reached  zero,  the  macroblock  is  selected  for 
transmission.  Otherwise,  the  distance  metric  is  applied  to  determine  if  the  macroblock 
should  be  selected  due  to  motion.  The  order  of  the  two  events  is  important.  Since  the 
distance  metric  does  not  need  to  be  calculated  for  those  macroblocks  selected  due  to 
aging,  the  result  is  a  net  decrease  in  the  number  of  calculations  required  to  select 
macroblocks  for  transmission.  In  either  case,  after  a  macroblock  has  been  transmitted,  a 
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new  update  is  scheduled  m  frames  in  the  future,  where  m  is  a  discrete  uniform  random 
variable  distributed  in  the  range  [l,«].  Pseudocode  for  this  algorithm  is  listed  in  Figure 
IV. 8.  The  update  interval  is  initialized  to  0  at  the  start  of  coding  in  recognition  that  all 
macroblocks  in  the  first  frame  must  be  coded. 


initialize  update_table [99 ] 

to  0; 

for  each  frame  k 

%  Process  each  macrobloc 
for  each  MB  j  =  1  to  99 

k  in  frame 

%  Count  down  to  next 
update_table [ j ]  -=  1 

forced 

update 

%  Check  for  forced  update 
if  update_table [ j ]  =  0 

encode  block 

update_table [ j ]  =  random 

update 

%  Check  for  block  selection 

else  if  distance (MB  j)  >  threshold 

encode  block 

update_table [ j ]  =  random  update 
end 

end 

end 

Figure  IV.8:  Pseudocode  for  Aging  Algorithm. 


Using  a  uniform  distribution  to  schedule  updates  smoothes  block  selections  over  n 
frames  and  decorrelates  the  selection  of  individual  macroblocks  through  aging.  Choosing 
aging  intervals  randomly  also  prevents  events,  such  as  scene  changes,  from  correlating 
updates  and  generating  spikes  in  bit  rate.  The  value  chosen  for  n  controls  the  tradeoff 
between  additional  bandwidth  required  and  coder  responsiveness.  For  a  given  value  of  n, 
the  number  of  additional  macroblocks  selected  through  aging  per  frame  Na  is 

2A^, 


A^. 


MB 


(n  +  1) 


(IV.2) 
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where  Nmb  is  the  number  of  macroblocks  in  the  frame.  Actually  the  bandwidth  impact  is 
lower  since  some  of  the  blocks  selected  via  aging  would  have  been  selected  anyway  due 
to  scene  motion. 

For  the  video  sequences  examined  in  this  work,  n  was  set  to  20.  This  value  offers 
an  acceptable  compromise  between  bandwidth,  corresponding  to  an  additional  9.43 
macroblocks  per  frame,  and  responsiveness.  New  VTC  participants  are  guaranteed  to 
receive  a  complete  frame  after  2  seconds,  at  10  fps,  and  visual  errors  are  bounded  by  the 
same  value. 

3.  Layering  Strategy 

Macroblocks  selected  for  transmission  are  decomposed  into  layers  using  a 
wavelet  transform.  Since  the  selection  process  takes  place  before  the  transform  stage,  the 
transform  is  only  applied  to  those  macroblocks  requiring  transmission.  A  wavelet-based 
approach  was  chosen  since  frequency  decomposition  offers  the  most  flexibility  in 
populating  layers.  A  macroblock  may  reasonably  be  decomposed  into  as  many  as  sixteen 
2x2  subbands,  using  a  uniform  decomposition,  which  then  may  be  combined  in  various 
manners  to  create  an  arbitrary  number  of  layers  (up  to  sixteen).  The  challenge  is  in 
determing  an  appropriate  number  of  layers  and  apportionment  of  the  frequency  content 
within  the  macroblock  across  those  layers. 

As  layers  are  hierarchical  in  importance,  layer  assignments  should  map  frequency 
content  to  that  hierarchy  in  a  manner  consistent  with  perceptual  importance.  Just  as 
important,  the  bit  rate  allocation  resulting  from  the  layer  assignments  should  be 
segragated  such  that  dropping  a  layer  offers  the  potential  for  decreasing  congestion.  In 
practice,  meeting  these  expectations  with  a  single  layering  scheme  proved  impractical. 
Therefore,  two  specific  layering  schemes  were  required:  one  for  video  sequences  and  one 
for  static  presentation  slide  sequences. 

For  both  types  of  sequences,  layering  is  accomplished  through  application  of  the 
fast  Haar  transform  (FHT)  to  each  selected  macroblock.  The  FHT  is  the  simplest  possible 
wavelet  algorithm  [60]  and  is  described  by 
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(IV.4) 


where  Xq  is  the  original  data  vector,  and  vectors  x"  and  x^  are  the  average  and  detail 

decomposition  vectors,  respectively.  The  FHT  has  several  desirable  properties  with 
regard  to  minimizing  coder  complexity.  First,  the  FHT  is  a  real  transform,  so  no  complex 
arithmetic  is  required  and  storage  is  simplified.  Second,  the  FHT  is  not  computationally 
demanding  as  its  application  requires  only  addition,  substraction,  and  left-  and  right- 
shifts.  Finally,  unlike  more  sophisticated  wavelet  transforms,  the  FHT  does  not  require 
extending  or  padding  the  data  set.  However,  the  simplicity  of  the  FHT  can  lead  to 
blocking  artifacts  at  high  compression  levels  since  the  average  and  detail  calculations  are 
confined  only  to  contiguous  pixels. 

Since  video  information  is  two-dimensional,  Eq.  (IV.3)  and  Eq.  (IV.4)  can  be 
applied  to  each  dimension  idependently,  resulting  in  four  uniform  subbands  as  discussed 
in  Section  III.F.  A  key  difference  from  that  discussion  is  that  the  average  and  detail 
equations  are  applied  to  individual  macroblocks  instead  of  the  entire  frame.  The  resulting 
average  (LL)  subband  and  the  three  detail  subbands  (HL,  LH,  and  HH)  are  each  8x8  in 
size.  The  actual  operations  required  to  generate  each  subband  and  the  physical 
signficance  of  each  subband  are  given  in  Table  IV. 2. 


Subband 

Detail 

Horizontal 

Vertical 

Operation 

Operation 

LL 

Lowpass 

Average 

Average 

LH 

Horizontal 

Average 

Detail 

HL 

Vertical 

Detail 

Average 

HH 

Diagonal 

Detail 

Detail 

Table  IV.2:  Significance  and  Determination  of  Wavelet  Subbands. 

The  coder  restricts  the  number  of  layers  to  three.  The  decision  to  consider  no 
more  than  three  layers  was  driven  by  the  limited  bandwidth  available.  Each  layer 
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consumes  an  equal  amount  of  bandwidth  in  overhead.  While  a  greater  number  of  layers 
offers  more  flexibility  in  managing  quality  and  congestion,  at  64-96  kbps,  three  layers 
appears  to  be  the  limit  in  terms  of  producing  layers  that  provide  a  perceptible 
improvement  in  quality. 

The  initial  layering  strategy  considered  for  both  the  video  and  the  static  slide 
sequences  performs  only  a  first  order  analysis  of  each  selected  macroblock,  generating 
the  subbands  listed  in  Table  IV. 2.  Each  subband  generated  is  assigned  to  a  layer  as 
shown  in  Table  IV. 3.  The  layer  assignments  are  intended  to  promote  a  graceful  increase 
in  quality  by  progressively  adding  frequency  content.  The  base  layer  is  essentially  a 
lowpass-filtered  version  of  the  original  macroblock,  and  the  two  enhancement  layers 
successively  add  in  higher  frequency  details.  Since  the  LL  subband  retains  many  of  the 
perceptual  properties  of  the  original  macroblock,  the  LL  subband  is  transformed  further 
using  the  2-D  DCT.  The  additional  transform  allows  the  LL  subband  to  be  processed 
using  JPEG,  an  approach  that  exploits  that  standard's  emphasis  on  maximizing  retention 
of  the  most  perceptually  relevant  information. 


Layer Subband(s)  Included 

Base                            LL 
1^' Enhancement               LH,  HL 
2"^  Enhancement HH 

Table  IV.3:  Preliminary  Layer  Assignments 

Preliminary  results  for  the  initial  layering  approach  were  disappointing.  With 
regard  to  video  sequences,  the  base  layer  gives  acceptable  quality  and  the  first 
enhancement  layer  produced  a  marked  improvement  in  quality.  However,  the  bit  rate 
allocated  to  the  second  enhancement  layer  by  this  assignment  scheme  was  small  (<  10%), 
and  application  of  the  layer  only  occasionally  produced  a  perceptible  improvement  in 
quality.  For  static  slide  sequences,  the  situation  is  reversed.  Slides  consist  of  text  and 
line  drawings,  which  exhibit  a  different  frequency  characteristic  than  motion  video.  The 
preponderance  of  sharp  edges,  in  all  directions,  increases  the  relative  importance  of 
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higher  frequency  content  in  these  frames  relative  to  motion  video  frames.  As  a  result,  the 
hierarchy  given  in  Table  IV. 3  is  reasonable  for  motion  video  but  unsuited  for  static 
sequences.  Due  to  the  absence  of  high  frequency  content,  text  and  block  diagrams  were 
blurry  and  indistinct.  Even  adding  the  first  enhancement  layer  only  yielded  a  marginal 
improvement.  Indeed,  only  the  final  addition  of  diagonal  detail  produced  acceptable 
quality. 

The  results  indicate  that  a  frequency-based  hierarchical  scheme  designed  for 
motion  video  is  unsuitable  for  static  sequences.  Although  examined  further  below,  the 
converse  also  appears  to  be  true.  Therefore,  separate  layering  schemes  were  formulated 
for  each  sequence.  The  coder  deduces  the  type  of  sequence  present  and  applies  the 
appropriate  layering  scheme. 

The  ad  hoc  approach  presented  above  indicates  the  need  for  a  more  general 
technique  for  determining  an  appropriate  layering  structure  for  a  video  stream.  The 
problem  is  to  determine,  given  that  n  layers  are  desired,  to  what  degree  a  selected 
macroblock  is  decomposed  and  how  the  resulting  subbands  are  allocated  to  each  layer. 
Here,  we  propose  a  variant  of  the  split-and-merge  algorithm  [73]  applied  at  the 
macroblock  level.  Instead  of  applying  the  algorithm  in  the  spatial  domain  to  identify 
regions  of  equivalent  activity,  the  algorithm  is  applied  to  selected  macroblocks  in  the 
frequency  domain  to  identify  regions  of  similar  energy  and  perceptual  content. 
Essentially,  the  macroblock  is  split  into  equal  segments  using  the  FHT,  subbands  of 
approximately  equal  variance  are  grouped,  and  the  resulting  regions  are  allocated  to 
individual  layers.  At  this  point,  dynamically  changing  the  layering  structure  is  not 
permitted. 

Given  a  representative  video  sequence,  the  first  step  of  the  algorithm  is  to  split 
each  macroblock  using  the  FHT.  The  macroblock  is  split  into  equal  subbands  by 
recursively  applying  the  FHT  to  each  subband  until  the  desired  number  of  subbands  is 
created.  For  example,  a  first  order  decomposition  of  the  macroblock  creates  four  8x8 
subbands  (LL,  LH,  HL,  HH).  A  second  order  decomposition  of  each  of  these  subbands 
creates  sixteen  4x4  subbands  as  shown  in  Figure  IV. 9.  Continuing  the  example,  a  second 
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order  decomposition  of  the  LL  subband  results  in  the  LLLL,  LLLH,  LLHL,  and  LLHH 
subbands.  Likewise,  a  third  order  decomposition  produces  64  2x2  subdands.  In  practical 
application,  stopping  at  a  second  order  decomposition  proved  sufficient  for  three  layers. 

Using  the  representative  video  sequences,  the  variance  of  the  coefficient  set 
comprising  each  subband  is  determined  across  all  frames  of  video.  Using  subband 
variance  as  a  metric  to  form  layers  offers  two  benefits.  First,  with  motion  video,  variance 
appears  to  have  an  inverse  relationship  to  spatial  f/equency  and  thus  perceptual 
importance.  Therefore,  differences  in  variance  provide  a  convenient  mechanism  for 
assigning  subbands  to  a  layered  hierarchy.  Second,  grouping  subbands  with  a  similar 
variance  is  convenient  since  each  group  can  employ  a  common  quantizer.  Several 
quantizer  schemes  allocate  bits  by  varying  quantizer  step  size  in  inverse  proportion  to 
variance.  This  approach  uses  variance  as  indication  of  the  dynamic  range  exhibited  by 
the  coefficients.  One  such  scheme,  described  later,  apportions  bits  in  an  attempt  to 
balance  distortion  introduced  across  each  subband  [48]. 


16x16 
Macrobiock 


FHT 

► 


8x8 
Subband 

• 

FHT 

► 


4x4 

Figure  IV.9:  Splitting  a  Macrobiock  into  Uniform  Subbands. 

The  subband  variances,  computed  using  several  test  video  sequences,  after  first 
order  decomposition,  are  shown  in  Table  IV.4.  The  subband  variances  after  a  second 
order  decomposition  are  shown  in  Table  IV. 5.  Subband  variance  provides  a  good 
indication  of  energy  concentration  within  each  subband.  Since  the  video  images  are 
lowpass,  the  energy  is  concentrated  in  the  lowest  subband  as  shown  in  Table  IV.4.  By 
extension,  the  subband  variance  also  provides  an  indication  of  relative  perceptual 
importance,  an  observation  that  allows  subband  variance  to  dictate  layer  assignments.  A 
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second  order  decomposition  further  differentiates  the  frequency  content  found  in  the  first 
order  subbands.  For  example,  after  a  second  level  decomposition  of  the  LH  subband, 
energy  is  now  concentrated  in  the  LHLL  and  LHLH  subbands.  Values  in  Table  IV. 5 
resemble  the  transpose  of  Figure  III. 4  and  demonstrate  that  subband  variance  strongly 
tracks  the  visual  components  in  the  macroblock.  This  strengthens  the  argument  for  using 
subband  variances  to  make  layer  assignments  in  a  hierarchical  manner  for  video 
sequences. 


O'u 

O'lh 

0     HL 

^  2 

0     HH 

2891 

52.0 

73.3 

12.4 

Table  IV.4:  Subband  Variances  after  a  First  Order  Decomposition  (Motion  Video). 
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2702.0      57.7 
117.4        12.5 

19.2         21.6 
4.5           6.8 

27.7         6.0 
31.5         8.0 

2.2          2.8 
3.2          4.2 

Table  IV.5:  Subband  Variances  after  a  Second  Order  Decomposition  (Motion 

-  Video). 


After  variance  data  has  been  gathered  for  each  subband  at  the  desired  level  of 
analysis,  the  next  step  is  to  group  adjacent  subbands  exhibiting  similar  variances.  The 
criterion  suggested  by  [73]  is  to  group  adjacent  subbands  k\  and  ki  with  variances  cjI 

and  gI^  ,  respectively,  when 
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and  (7^3^  and  cr^,^  represent  the  maximum  and  minimum  variances  found  among  the 

subbands.  The  parameter  Nh  represents  the  total  number  of  subbands. 

Grouping  of  subbands  based  on  the  variances  in  Table  rV.5  results  in  the 
partitions  shown  in  Figure  IV.  10.  Assuming  that  each  subband  is  independent,  the 
variance  of  each  partition  Pk  is  simply  the  sum  of  the  variances  for  the  subbands  k, 
comprising  that  subband: 

<  =  S<  •  av.7) 

Since  the  subbands  comprising  each  partition  have  similar  variances,  each  partition  can 
be  quantized  using  the  same  scheme  such  that  quantization  errors  are  spread  uniformly 
among  the  subordinate  subbands. 
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Figure  IV.IO:  Partitions  Resulting  from  Merge  Algorithm. 

Next,  the  resulting  partitions  are  assigned  to  layers  Lj  until  the  requisite  number  of 
layers  are  created  using  the  following  set  of  heuristic  rules: 


Rule  1:  No  layer  may  have  a  greater  variance  than  any  lower  layer.  That  is,  given 
N  layers. 


c:  >a,  > •••><j; 


(IV.8) 
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Rule  2:  Layers  must  be  populated  in  order  of  increasing  frequency  content.  A 
layer  may  not  contain  a  partition  of  lower  frequency  content  than  any  layer  below 
it. 

Rule  3:  Partitions  that  meet  the  criterion  given  by  Eq.  (IV. 7)  are  assigned  to  the 
same  layer  even  if  the  partitions  are  non-contiguous. 

Rule  4:  Partitions  are  applied  to  layers  in  a  symmetric  fashion. 

Rule  5:  If  more  than  two  subbands  comprising  a  coarser  subband  remain  as 
partitions  after  applying  the  above  rules,  group  all  of  the  partitions  comprising  the 
coarser  subband  together  into  one  partition. 

Rule  6:  If  one  or  more  partitions  are  moved  between  layers,  as  required  to 
achieve  a  more  balanced  distribution  of  bit  rates  or  quality,  move  the  partition(s) 
with  the  lowest  variance  if  promoting  to  a  higher  layer  and  the  partition(s)  with 
highest  variance  if  demoting  to  a  lower  layer. 

The  reasoning  behind  these  rules  stem  from  the  requirements  stated  for  layered 
coder  design  at  the  start  of  this  section.  Rule  1  ensures  that  no  upper  layer  receives  a 
greater  bit  allocation  than  the  lower  layers.  This  provides  a  more  logical  sequence  to  the 
layer  hierarchy  since  the  lower  layers  will  make  a  greater  contribution  to  reconstructed 
quality,  and  quality  loss  due  to  layer  dropping  is  more  gradual.  Rule  2  matches  the  layer 
hierarchy  to  the  observed  frequency  dependence  displayed  by  the  human  visual  system 
(HVS)  and  ensures  a  more  graceful  degradation  in  quality  during  periods  of  congestion. 
Rule  3  simplfies  quantizer  design  by  allowing  non-contiguous  partitions  to  use  the  same 
quantization  scheme.  Rule  4  ensures  that  neither  horizontal  or  vertical  detail  dominate  a 
partially  reconstructed  frame.  A  lack  of  balance  between  these  components  distorts  the 
image  and  causes  scene  elements  to  appear  elongated.  Simplifying  coder  design  and 
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minimizing  processing  delay  are  the  main  considerations  for  Rule  5.  Each  distinct 
partition  or  subband  transmitted  requires  overhead  within  the  bit  stream  for  the  decoder 
to  correctly  position  the  contributions.  A  greater  number  of  subbands  also  complicates 
quantizer  design  and  rate  control.  Concatenating  the  single  subband  partitions  into  their 
coarser,  parent  subband  offsets  these  concerns  and  reduces  the  computational  burden 
required  to  transform  the  macroblock  since  an  analysis  step  is  dropped. 

Rules  1-5  help  determine  an  effective  layering  scheme  for  motion  video. 
However,  implementation  provides  the  final  test  of  the  effectiveness.  Two  problems  may 
result  during  implementation  as  discovered  in  the  first  ad  hoc  approach  attempted.  The 
resulting  bit  rate  for  a  layer  may  be  small  such  that  bitstream  overhead  is  too  high.  Or  a 
layer  may  appear  to  offer  a  negligible  impact  of  reconstructed  quality.  In  either  case,  the 
solution  is  to  reduce  the  number  of  layers  by  concatenating  the  ineffectual  layer  with  an 
adjacent  layer  or  to  move  partitions  between  layers.  The  latter  situation  is  covered  under 
Rule  6,  which  provides  guidance  for  moving  partitions  between  layers  without  violating 
the  other  rules. 

Application  of  these  rules  to  the  partitions  shown  in  Figure  FV.IO  resulted  in  the 
final  layering  scheme  for  motion  video  sequences  shown  in  Figure  IV.  11 .  The  LL 
subband  is  assigned  to  layer  I  and  further  transformed  via  DCT  as  previously  discussed. 
The  HH  subband  is  assigned  to  layer  Hi  in  its  entirety.  The  HL  and  LH  subbands  are 
further  decomposed.  The  resulting  subbands  are  partitioned  and  assigned  to  layers  II  and 
III.  The  layer  assignments  in  Figure  IV.  1 1  also  provide  the  basis  for  the  quanization 
scheme  discussed  in  the  next  section. 
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Figure  IV.ll:  Final  Layering  Scheme  *'or  Motion  Video  Sequences. 

The  generalized  layering  scheme  presented  above  is  biased  for  motion  video 
sequences.  Consequently,  the  layering  scheme  presented  in  Figure  IV.  1 1  is  not  suitable 
for  static  slide  sequences.  Static  slide  sequences  show  a  much  greater  dependence  on 
higher  frequency  components  for  perceptual  recog  lition  since  text  and  line  drawings 
have  a  much  higher  preponderance  of  edge  detail.  Any  hierarchical  scheme  based  on  the 
lowpass  nature  exhibited  by  images  yields  a  blurred  reproduction  with  only  the  lower 
layers  and  gives  satisfactory  results  only  when  the  high  frequency  layers  are  added.  For 
example,  applying  the  motion  video  layering  scheme  to  slides  containing  text  and  line 
drawings  only  gives  acceptable  results  when  all  three  layers  are  received.  Obviously,  this 
defeats  the  purpose  of  layering  video.  Therefore,  a  different  layering  scheme  is 
appropriate  if  the  video  stream  is  to  include  both  types  of  sequences. 

Although  the  general  layering  scheme  presented  above  is  not  applicable  to  static 
slide  sequences,  application  of  the  split-and-merge  algorithm  is  still  meaningful.  The 
variances  exhibited  by  the  subbands  generated  after  a  first  and  second  level  analysis  of 
slide  sequences  consisting  of  text  and  line  drawings  is  shown  in  Table  rv.6  and  Table 
IV. 7,  respectively.  Comparing  these  values  to  those  for  the  motion  video  sequences 
given  earlier,  it  is  evident  that  energy  is  much  more  evenly  distributed  among  the 
different  subbands.  The  result  promotes  a  much  more  complex  relationship  between 
variance  and  perceptual  importance  which  is  demonstrated  in  the  close  interdependence 
between  the  various  subbands. 
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Table  IV.6:  Subband  Variances  after  a  First  Level  Decomposition  (Slide  Sequence). 
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Table  IV.7:  Subband  Variances  after  a  Second  Level  Decomposition  (Slide 

Sequence). 

Applying  the  split-and-mergt  algorithm  results  in  the  partitions  shown  in  Figure 
IV.  12.  Using  the  layer  assignment  rjles  outlined  above,  partitions  P|,  P2,  and  P4  are 
assigned  to  the  base  layer.  However,  reconstruction  based  solely  on  the  base  layer  gives 
very  poor  results.  Even  adding  partitions  P3P5,  and  Pe  fails  to  achieve  acceptable  results 
even  though  such  an  arrangement  includes  a  large  portion  of  the  energy  contained  in  the 
macroblock.  Therefore,  unlike  in  the  motion  video  case,  variance  alone  provides  a  very 
poor  guide  to  determining  perceptual  relevance.  Instead,  achieving  acceptable 
reconstruction  starting  with  the  base  layer  requires  contributions  from  each  of  the  8x8 
subbands.  In  practice,  the  layering  scheme  shown  in  Figure  rV.13  was  found  to  be 
suitable.  The  base  layer  consists  of  those  4x4  subbands  containing  the  most  significant 
details  as  determined  by  variance.  Although  in  motion  sequences  the  LLLL  subband  is 
expected  to  have  a  lowpass  frequency  characteristic  consistent  with  the  original 
macroblock,  this  does  not  hold  true  with  the  static  sequences.  Therefore,  application  of 
the  DCT  provides  no  additional  benefit.  The  remaining  subbands  are  divided  between 
the  remaining  layers  in  order  of  increasing  frequency  content. 
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Figure  IV. 12:  Partitions  Resulting  from  Merge  Algorithm  (Slide  Sequence). 
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Figure  IV.13:  Final  Layering  Scheme  for  Static  Slide  Sequences. 

Although  the  partitions  in  Figure  IV.  12  do  not  directly  lead  to  a  satisfactory 
layering  arrangement,  continuing  the  examination  does  lead  to  a  simple  quantization 
scheme.  After  merging  partitions  with  similar  variances,  the  partitions  have  been  reduced 
to  those  shown  in  Figure  IV.  14.  Although  partitions  P2  and  P3  are  not  close  enough  for 
merging,  given  Eq.  (IV. 5),  they  are  sufficiently  close  in  variance  such  that  the  simplicity 
gained  by  quantizing  both  bands  together  balances  any  possible  sub-optimal  bit 
allocation.  The  final  partitions,  for  the  purpose  of  quantization,  are  shown  in  Figure 
IV. 15. 
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Figure  IV.14:  Partitions  Remaining  After  Merging  Similar  Non-Contiguous 

Partitions. 
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Figure  IV.15:  Partitions  for  the  Purpose  of  Quantization. 

Since  two  different  layering  schemes  are  used,  the  coder  requires  some  criteria  for 
determining  the  type  of  video  is  present.  The  determination  is  made  following  each 
scene  change.  The  coder  judges  that  a  scene  change  has  occurred  if  the  number  of 
macroblocks  selected  exceeds  some  threshold.  After  examining  the  block  selection 
statistics  for  motion  video,  selecting  a  threshold  three  standard  deviations  above  the  mean 
block  selection  rate  was  high  enough  to  avoid  spurious  scene  change  detections.  If  a 
scene  change  has  occurred,  the  coder  examines  the  number  of  macroblocks  selected  due 
to  motion  in  the  next  frame.  If  the  value  is  zero,  the  current  sequence  is  assumed  to  be 
static  since  obviously  no  motion  has  occurred  within  the  scene.  Otherwise,  the  sequence 
is  assumed  to  be  a  motion  video  sequence. 
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4.  Quantization  and  Lossless  Coding 

After  the  transform  stage,  individual  subbands  are  quantized  and  losslessly  coded 
according  to  their  layer  assignment  (motion  video  sequences)  or  partition  assignment 
(static  sequences).  The  main  difference  is  that  the  base  layer  for  motion  video  sequences 
is  encoded  using  the  JPEG  standard.  Otherwise,  uniform  quantization  is  used  with  a 
single  step  size  for  each  layer/partition  followed  by  Huffman  coding. 

The  quantization  and  coding  stage  for  motion  video  macroblocks  is  shown  in 
Figure  IV.  16.  The  LL  subband  coefficients  are  quantized  and  encoded  using  the 
luminance  quantization  array  and  luminance  VLC  table  suggested  in  [75].  This  process 
is  summarized  in  Chapter  III. 

The  remaining  subbands  are  uniformly  qur\ntized  using  a  fixed  quantizer  step  size 
for  all  coefficients  in  that  subband.  The  value  of  the  quantizer  step  size  is  set 
independently  for  Ql  and  Q2,  and  all  subbands  entering  a  particular  quantizer  use  a 
common  step  size. 
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Figure  IV.16:  Quantization  and  Coding  for  Motion  Video  Macroblocks. 

Unlike  in  JPEG  encoding,  zig-zag  scanning  of  the  quantized  FHT  coefficients 
provides  no  apparent  coding  gain.  Instead,  trials  indicated  that  a  simpler  horizontal  raster 
scan  was  adequate  for  all  bands  except  the  HL  subband.  The  HL  subband  showed  a 
slight  preference  for  a  vertical  raster  scan,  which  seems  consistent  given  the  frequency 
orientation  of  this  band.  The  scan  orders  are  summarized  in  Table  rv.8,  where  the  scan 
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order  applies  to  the  subband  indicated  as  well  as  all  child  subbands.  The  LL  entry 
pertains  only  to  coding  of  static  macroblocks  and  is  included  for  completeness. 


Parent  Subband 

Scan  Order 

LL 

Raster 

LH 

Raster 

HL 

Vertical  Raster 

HH 

Raster 

Table  IV.8:  Scan  Order  for  Encoding  Quantized  Coefficients. 

After  scanning,  each  non-zero  coefficient  is  losslessly  coded  using  a  Huffman 
VLC  code.  The  coding  scheme  chosen  mirrors  the  3-D  event  structure  employed  by  the 
H.263  coding  standard.  Each  non-zero  coefficient  is  replaced  by  an  equivalent  event 
described  by  three  parameters  [56]:  {LAST,  RUN,  LEVEL}  where  LAST  indicates 
whether  there  are  any  more  non-zero  coefficients  in  the  current  subband;  RUN  indicates 
the  number  of  successive  zeros  that  precede  the  non-zero  coefficient;  and  LEVEL 
represents  the  non-zero  magnitude  of  the  quantized  coefficient.  Each  event  maps  to  a 
VLC  codeword  to  which  a  sign  bit  is  appended  to  represent  the  sign  of  the  coefficient.  A 
VLC  table  was  derived  for  motion  sequences  using  a  series  of  representative  test 
sequences  [74]. 

The  quantization  and  coding  stage  for  static  macroblocks  is  shown  in  Figure 
IV.  17.  The  major  difference  compared  to  rhotion  macroblocks  is  that  JPEG  is  not 
employed.  Instead,  the  sixteen  subbands  are  supplied  to  one  of  three  independent 
uniform  quantizers,  Ql ,  Q2,  and  Q3,  each  with  a  fixed  quantizer  step  size.  After 
quantization,  each  non-zero  coefficient  is  replaced  by  a  3-D  VLC  codeword  as  described 
above  although  a  different  VLC  table  is  employed.  Again,  the  VLC  table  was  developed 
from  a  series  of  representative  sequences  [74]. 

Neither  Figure  IV.  1 6  nor  Figure  IV.  1 7  indicates  the  presence  of  the  control  signal 
from  the  Control  Unit  shown  in  Figure  IV. 5.  The  control  signal  allows  manipulation  of 
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the  quantizer  step  sizes,  or  a  scaling  factor  in  the  case  of  the  JPEG  quantizer,  as  required 
by  a  rate  control  scheme.  Rate  control  schemes  are  covered  later  in  this  chapter. 
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Figure  IV.17:  Quantizatio:  i  and  Coding  for  Static  Macroblocks. 
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RESULTS 


This  section  includes  some  example  video  traces  for  a  short  video  segment 
consisting  of  100  frames  of  a  single  speaker  followed  by  50  frames  of  a  presentation  slide 
filled  with  line  diagrams  and  text.  A  sample  frame  from  each  sequence  is  shown  in 
Figure  IV.  18  and  Figure  IV.  19.  Each  shows  the  original  frame  and  the  reconstructed 
frame  with  only  the  base  layer  received,  the  base  layer  and  the  first  enhancement  layer 
received,  and  all  layers  received.  With  the  exception  of  scene  changes,  the  coder 
employed  no  rate  control  for  these  sequences;  a  single  set  of  quantizers  is  used  for  each 
sequence  and  not  varied  during  the  run.*-  During  a  scene  change,  the  first  new  frame  of 
the  scene  is  heavily  compressed  to  avoid  spikes  in  the  outgoing  bit  rate.  The  video 
quantizers  employed  produced  an  average  bit  rate  of  80  kbps  for  the  video  sequence  and 
40  kbps  for  the  static  sequence  although  the  bit  rate  would  be  expected  to  vary  locally. 
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Figure  IV.18:  Original  and  Reconstructed  Frames  From  a  Motion  Video  Sequence. 

Figure  IV. 20  and  Figure  IV. 21  show  the  bit  rate  trace  for  the  combined  sequences 
and  the  plot  of  pSNR  as  a  measure  of  reconstructed  video  quahty  (see  Eq.  (III. 7)).  The 
granularity  in  bit  rate  offered  by  a  layered  video  hierarchy  is  evident  in  Figure  IV. 20;  as 
congestion  occurs,  the  lower  layers  could  be  retained  while  preserving  most  of  the 
quality.  The  bit  rate  ratio  among  layers  is  approximately  5:3:2  for  both  sequences.  As 
expected,  the  bit  rate  for  the  static  sequence  is  much  lower  since  the  bit  rate  results  solely 
from  macroblock  aging.  For  this  reason,  rate  control  is  not  of  significant  benefit  for  the 
static  sequences.  Using  a  pointer  within  the  overhead  slide  would  result  in  macroblocks 
selected  due  to  motion  and  increase  the  bit  rate  slightly,  but  bit  rate  would  still  not  reach 
the  level  displayed  for  the  motion  video  sequence. 
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Figure  IV.19:  Original  and  Reconstructed  Frames  From  a  Static  Video  Sequence. 

Figure  rV.21  illustrates  the  progressive  improvement  in  quality  as  additional 
layers  are  added  to  the  base  layer.  At  the  beginning  of  each  sequence,  quality  in  terms  of 
pSNR  improves  sharply  over  the  aging  interval  following  a  scene  change.  After  this 
period,  quality  is  observed  to  remain  relatively  flat  for  each  sequence  regardless  of  the 
number  of  layers  as  expected  since  no  attempt  is  made  to  vary  bit  rate.  For  the  motion 
video  sequence,  the  base  layer  provides  a  smoothed  but  acceptable  display.  Text  in  the 
frame  is  not  readable,  but  the  speaker's  movements  are  easy  to  follow.  Adding  the  first 
enhancement  layer  improves  sharpness  and  adds  a  4  dB  improvement  in  pSNR  although 
small  text  is  still  difficult  to  discern.  The  second  enhancement  layer  only  adds  1-2  dB 
improvement  but  small  text  is  finally  readable  and  other  features  with  fine  edges  are 
sharper.  With  static  video,  the  role  of  the  enhancement  layers  is  even  more  dramatic. 


Ill 


Even  though  most  of  the  macroblock's  energy  is  included  in  the  base  layer  and 
contributions  from  each  frequency  band  are  included,  the  base  layer  still  shows  a  large 
degree  of  smoothness  although  the  shapes  are  readily  identifiable.  Adding  the  first 
enhancement  layer  adds  a  7  dB  improvement  and  dramatically  improves  sharpness.  The 
final  layer,  even  though  the  bit  rate  contribution  is  the  smallest  of  the  three  layers,  almost 
doubles  the  pSNR,  and  the  reconstructed  frame  is  virtually  identical  to  the  original  frame. 
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Figure  IV.20:  Bitrate  per  Frame  for  the  Layered  Video  Sequence. 
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Figure  I V.21 :  Reconstructed  pSNR  for  the  Layered  Video  Sequence. 
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E.         SIMPLE  LAYERED- VIDEO  RATE  CONTROL 

Compressed  video  is  variable  bit  rate  by  nature  since  compression  gain  varies 
based  on  scene  activity  and  complexity.  However,  transmission  channels  inevitably 
require  some  constraints  on  bit  rate  because  of  channel  capacity  or  QoS  constraints. 
Most  commonly,  bit  rate  is  constrained  to  maintain  a  constant  rate  or  to  maintain  a 
constant  local-average  bit  rate  over  time.  Many  factors  affect  bit  rate,  but  the  most 
important  is  the  tradeoff  between  quantizer  step  size  and  image  fidelity.  A  larger  step 
size  results  in  a  lower  bit  rate  and  a  larger  amount  of  distortion.  Reducing  the  step  size 
increases  the  bit  rate  but  reduces  the  amount  of  distortion.  Rate  control,  therefore, 
requires  evaluation  of  the  rate-distonion  relationship  created  by  a  particular  coder  design. 
The  rate  control  problem  may  be  posed  in  terms  of  the  rate-distortion  relationship.  The 
goal  of  the  encoder  is  to  minimize  distortion  D  subject  to  a  bit  constraint  Re,  i.e.,  R<Rc 
[53].  This  problem  is  solved  using  Lagrangian  optimization  by  expressing  a  cost  function 
in  terms  of  a  distortion  term  weighted  against  a  rate  term  [48].  The  optimal  solution  is 
one  that  minimizes  the  cost  function  J,  given  by 

J=D  +  AR,  (IV.9) 

where  X  is  the  Lagrange  multiplier.  Expressing  distortion  as  a  function  of  rate,  D(R),  and 
differentiating  on  both  sides  with  respect  to  R  to  find  a  minimum  results  in 

^^j.m,x^o.  (iv.io) 

dR'       dR 
which  indicates  that  each  Lagrange  multiplier  X  yields  a  particular  optimal  solution. 
Each  tangential  point  on  the  rate-distortion  curve  therefore  corresponds  to  an  optimal 
solution  for  a  particular  rate  constraint.  Figure  IV.22  shows  a  possible  rate-distortion 
curve  and  an  optimal  solution  for  a  bit  rate  of  Rq.  While  the  true  rate-distortion  curve  is 
guaranteed  to  be  convex  [48],  the  operational  curve  is  influenced  by  the  coder  design, 
including  the  motion-prediction  scheme  employed,  the  quantizer  design,  and  lossless 
coding  gains.  Therefore,  rate  control  schemes  tend  to  only  approximate  the  rate- 
distortion  relationship  when  determining  a  method  for  varying  quantizer  step  size  to 
achieve  the  desired  bit  rate. 
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Figure  IV.22:  Rate-Distortion  Curve  and  a  Possible  Optimal  Solution. 


With  any  rate-control  scheme,  two  issues  are  of  importance.  First,  changes  to 
quantizer  step  size  must  be  communicated  to  the  decoder,  which  adds  to  the  coder's 
overhead  depending  on  how  often  the  parameter  is  changed.    Second,  rate-control 
schemes  must  be  kept  reasonably  simple  for  real-time  applications  to  minimize  coding 
delay. 

Numerous  feedback  control  schemes  for  rate  control  have  been  proposed  that 
track  actual  bit  allocation  in  some  manner  and  use  feedback  to  vary  quantizer  step  size. 
The  H.261  standard  [67]  suggests  an  approach  described  as  liquid  level  control  [6].  The 
H.261  reference  coder  examines  the  output  buffer  every  1 1  macroblocks.  If  the  buffer  is 
full,  quantizer  step  size  is  increased.  If  the  buffer  is  nearing  empty,  quantizer  step  size  is 
decreased.  H.261  leaves  the  actual  rate  control  scheme  up  to  the  designer.  One  feasible 
approach  is  the  feedback  control  scheme  proposed  by  Choi  and  Park  [76]  that  controls  the 
Lagrange  multiplier  X  based  on  the  output  buffer  state.  Low-delay  rate  control 
approaches  have  been  described  by  Telnor  Research  [55]  and  Ribas-Corbera  and  Lei  [77] 
for  H.263  and  H.263+,  respectively.  The  Telnor  approach  linearizes  the  relationship 
between  quantizer  size  and  bit  rate.  At  the  start  of  each  frame,  the  coder  determines  the 
deviation  between  the  bits  allocated  to  the  last  frame  ji5,.iand  the  target  bit  allocation  B  , 


I.e., 


AB,  =  B._,  -  B 


(IV.ll) 
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The  coder  also  attempts  to  allocate  an  equal  number  of  bits  to  each  macroblock  while 
encoding  the  current  frame  and  tracks  this  deviation  using  the  relationship 


^2=B,,^-^B,  (IV.12) 


where  rimt  represents  the  sequence  number  of  the  current  macroblock  and  Nmb  the  total 
number  of  macroblocks.  Then,  at  the  beginning  of  each  new  macroblock,  the  coder 
updates  quantizer  step  size  based  on  these  deviations: 


^mb  ~  ^i-l 


(^     AB,      \2AB,  ) 
25  R 


(IV.  13) 


where  R  is  the  allocated  channel  bit  rate  and  Q■_^  is  the  average  quantizer  size  in  the 

previous  frame.  Telenor's  approach  gives  an  equal  weighting  to  each  macroblock.  The 
approach  taken  by  [77]  is  similar  but  computes  an  optimal  quantizer  step  size  for  each 
macroblock  within  the  bit  budget  using  the  variance  exhibited  by  each  macroblock  as 
well  as  heuristic  weight  indicating  the  perceptibility  of  decode  artifacts. 

The  issue  of  rate-control  for  layered  video  has  not  been  well  addressed  in  the 
literature.  The  rate-control  problem  is  somewhat  complicated  by  the  multi-dimensional 
aspect  of  the  rate-distortion  curve  expressing  overall  distortion  as  a  function  of  an  n- 
dimensional  set  of  quantizers.  In  the  coder  presented  here,  the  bit  rate  depends  on  a  set  of 
three  quantizers.  Two  approaches  are  presented  below.  The  first  is  based  on  a  traditional 
rate-distortion  approach  that  assumes  that  both  rate  and  distortion  for  each  layer  are 
additive.  The  second  approach  uses  vector  quantization  to  reduce  the  dimensionality  of 
the  control  problem  and  approximates  an  optimal  rate-distortion  curve. 

1.         A  Rate-Distortion  Approach 

For  a  layered  coder,  separate  quantizers  are  employed  for  each  layer.  Assuming 
that  distortion  for  each  layer  /  is  additive,  the  rate  control  problem  becomes  minimizing 

N-l 


D  =  J^D,  (IV.14) 


1=0 


subject  to  the  constraint 
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A^-1 


Y,R,^R-  (IV.15) 

1=0 

The  assumption  that  each  layer  behaves  independently  allows  the  cost  function  to  be 
rewritten  as  [48] 

7,(^,)=Z),(/?,)+A/?,,  (IV.  16) 

^=ZA-  (IV.17) 

/=0 

Since  the  costs  are  additive,  J  is  minimized  when  each  J,  is  minimized.  Taking  the 
derivative  of  Eq.  (IV.  16)  to  find  the  minimum  results  in 

—^=     '^  '^+;i  =  o.  (IV.18) 

dR,         dR, 

Therefore,  a  particular  bit  rate  R  is  optimal  when  each  /?,  corresponds  to  points  with  the 
same  slope  on  their  respective  rate-distortion  curves. 

The  distortion  Di  introduced  by  quantization  is  related  to  the  rate  Ri  by  [78] 

D,{R^)=C,(7f2 

where  C,  depends  on  the  pdf  of  the  quantized  variable,  and  af  is  the  variance  of  the  input 

values.  Using  this  relationship,  the  Lagrangian  method  yields  the  following  optimal 
solution  [48], 

7?,=^  + log  2  —  ,  (IV.20) 

P 

where  R  =  R/N  is  the  mean  bit  rate  per  layer,  A^  is  the  number  of  layers,  and 


2/?, 

(IV.  19) 


'  N-\  \ 


'N 


V  '=0        J 
The  allocation  given  by  Eq.  (IV.20)  ensures  that  each  quantizer  has  the  same  average 
distortion. 

Using  Eq.  (rv.20),  one  possible  frame-based  rate  control  scheme  could  be 
implemented  as  follows.  First,  establish  the  bit  allocation  R  for  the  current  frame.  Then, 
calculate  the  bit  allocation  for  each  layer  using  Eq.  (rV.20).  Finally,  allocate  the  bits 
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evenly  per  coefficient  for  each  layer.  The  bits  allocated  per  pixel  are  used  to  calculate  the 
quantizer  step  size.  The  following  relationship  is  suggested  to  calculate  quantizer  step 
size  for  low  bit  rate  video  traffic  [79]: 


e      o] 


Q,=J -^  (IV.22) 

''log,  2/?, 

where  e  is  Napier's  constant  and  af  is  the  variance  of  subband  /.  For  macroblocks  that 

use  multiple  quantizer  step  sizes,  such  as  in  JPEG  coding,  the  result  is  used  to  establish 
the  average  step  size  for  the  macroblock. 

There  are  several  drawbacks  with  this  approach  for  rate  control.  The  most 
important  is  that  ensuring  average  distortion  at  each  quantizer  does  not  account  for  the 
perceptibility  of  errors  in  different  frequency  bands,  and  allocating  errors  in  a  different 
manner  could  provide  more  optimal  results  perceptually.  The  allocation  also  depends  on 
knowledge  of  the  variances  exhibited  by  each  layer.  Although  representative  variances 
may  be  calculated  a  priori  using  test  sequences,  more  accurate  allocation  requires 
dynamically  estimating  the  variances,  a  computationally  expensive  procedure.  Another 
problem  is  that  Eq.  (rV.20)  may  lead  to  negative  bit  allocations  if  the  difference  in 
variances  between  layers  is  large.  This  problem  is  correctable  by  forcing  non-negative 
allocations  in  Eq.  (rv.20)  although  the  resulting  allocations  would  not  be  optimal. 
Finally,  Eq.  (rV.20)  does  not  take  coding  gain  into  account.  Therefore,  using  R  as  the 
target  bit  allocation  leads  to  bit  allocations  that  are  too  low  after  taking  VLC  coding  into 
account.  One  ad  hoc  fix  is  to  replace  R  in  the  expression  with 

R'  =  R  G,  (IV.23) 

where  G  is  the  estimated  coding  gain  expected  from  the  entropy  encoder. 

More  sophisticated  algorithms  using  the  rate-distortion  concept  are  available.  For 
example,  "greedy"  schemes  allocate  bits  one  at  a  time  to  the  quantizer  demonstrating  the 
most  distortion  [78].  Other  schemes  apply  Lagrange  multipliers  to  arbitrary  rate- 
distortion  curves  [80].  However,  computational  complexity  and  delay  limit  the  feasibility 
of  more  advanced  methods  when  dealing  with  real-time  video. 
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2.  Approximation  of  the  3-D  Rate-Distortion  Curve 

The  approach  above  assumes  that  distortion  is  additive  in  the  operational  coder 
and  gives  the  same  average  distortion  for  each  quantizer  regardless  of  the  relative 
perceptual  importance  of  errors  in  each  layer.  The  assumption  of  additive  distortion 
implies  that  a  decrease  in  rate  requires  a  suitable  decrease  in  all  quantizer  parameters  to 
yield  an  optimal  solution.  Rate-distortion  curves  in  the  operational  coder  are  not 
necessarily  convex,  so  the  above  approach  does  not  necessarily  yield  optimal  results.  An 
alternate,  albeit  heuristic,  approach  is  to  simplify  the  control  problem  by  creating  a 
simplified,  operational  rate-distortion  curve. 

An  operational  distortion  curve  is  created  by  first  plotting  total  bit  rate  and 
distortion  (as  measured  by  pSNR)  separately  in  a  three-dimensional  space  spanned  by  the 
set  of  candidate  quantizers  for  a  series  of  motion  video  sequences.  This  process  captures 
the  operational  effect  of  the  coder  design,  such  as  the  quantizers  and  VLC  coding  as  well 
as  any  interdependence  between  layers,  on  the  rate-distortion  relationship.  The  result  is 
best  described  as  a  4-D  surface  wherein  both  rate  and  distortion  are  functions  of  a  triplet 
of  quantizer  parameters  {q\,q2,qz].  The  first  parameter  represents  the  JPEG  scaling 
factor  while  the  remaining  parameters  represent  the  actual  quantizer  step  sizes. 

Next,  the  points  representing  the  pSNR  surface  are  sorted  in  ascending  order  and 
associated  with  their  corresponding  quantizer  triplets.  For  those  triplets  producing 
approximately  the  same  pSNR,  only  that  point  with  the  smallest  bit  rate  is  retained.  The 
result  is  an  implicit  vector  quantization  of  the  operational  3-D  rate-distortion  surface. 
The  dimensionality  of  the  operational  rate-distortion  curve  is  therefore  reduced  to  the  1  -D 
curve  covering  the  operational  range  of  the  coder  as  shown  in  Figure  IV. 23.  Each  point 
on  the  curve  represents  results  from  a  single  quantizer  triplet.  The  corresponding 
quantizer  triplets  are  plotted  in  Figure  IV. 24.  The  results  indicate  that  an  optimal  rate 
control  scheme  does  not  necessarily  increase/decrease  each  quantizer  parameter  in 
lockstep  as  would  be  expected  if  distortion  in  each  layer  were  independent. 
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Figure  IV.23:  Operational  Rate-Distortion  Curve  for  Motion  Video. 
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Figure  IV.24:  Quantizer  Table  Triplet  Values  for  Motion  Video. 

Reducing  the  rate-distortion  relationship  to  a  suboptimal  1-D  relationship 
provides  a  potential  method  for  a  simplified  layered  rate  control  scheme  since  the  set  of 
possible  quantizer  parameters  is  reduced  to  a  more  manageable  set  of  suboptimal 
parameters.  Considering  each  triplet  as  a  suboptimal  quantizer  state,  a  feedback  control 


119 


scheme  manipulates  the  quantizers  for  each  layer  by  selecting  only  entries  from  this  set 
via  table  lookup.  One  possible  method  is  considered  next. 

Using  the  operational  rate-distortion  curve,  a  control  curve  relating  bits  per  frame 
to  each  suboptimal  quantizer  vector  is  created  as  shown  in  Figure  IV. 25.  After 
linearizing  the  control  curve  over  the  operational  range  of  the  coder,  the  slope  represents 
the  average  increment  or  decrement  in  bits  per  frame  with  a  step  change  in  the  quantizer 
table.  Dividing  this  quantity  by  the  average  number  of  macroblocks  selected  per  frame  in 
the  test  sequences  yields  the  desired  control  parameter  p, 

P  =  —^—  (IV.24) 


^QN 


MB 


AD 

where  N  mb  represents  the  average  number  of  macroblocks  selected  per  frame  and is 

the  slope  of  the  control  curve.  In  Figure  IV.25,  y5  was  determined  to  be  -11. 0 
bits/macroblock-step.  The  control  parameter  is  then  used  to  adjust  the  coder  quantizer 
vector  with  each  new  frame  as  per  the  following  scheme. 
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Figure  IV.25:  Operational  Rate  Control  Curve. 
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At  call  setup,  the  average  bit  allocation  per  frame  is  set  to 

B=-^^^,  (IV.25) 

where  i?target  ^^  ^^^  channel  bit  rate  and/ is  the  frame  rate.  For  each  new  frame  /,  we  use 

the  actual  bit  allocation  from  the  last  frame  /  -  1  to  estimate  the  bit  allocation  error  or 
deviation  expected  for  the  current  frame  /  if  the  quantizer  vector  used  in  the  last  frame  is 
not  changed.  Accounting  for  the  change  in  the  number  of  macroblocks  selected  between 
the  last  and  current  frames,  the  deviation  expected  is: 


AB  ,    =B- 


^^  MB, 


Nmb.. 


B._,.  (IV.26) 


The  required  change  in  the  quantizer  setting  is  calculated  using  the  deviation  AB  inter,  the 
number  of  macroblocks  selected  for  transmission  in  the  current  frame  A^^^  ,  and  the 

control  parameter: 

A5.„.„ 


AQ,= 


(IV.27) 


where  |_  J  is  the  fixed  integer  operator,  which  discards  the  decimal  portion  of  the  result. 

The  result  indicates  that  the  quantizer  setting  from  the  last  frame  should  be  incremented 
or  decremented  by  AQ ,.  K  the  quantizer  has  reached  the  upper  or  lower  limit  of  the 
table,  the  value  is  not  changed. 

Video  traces  for  a  rate  controlled  video  sequence  and  a  video  sequence  using  only 
open-loop  control  are  shown  in  Figure  rV.26.  Open  loop  control  consists  of  selecting  the 
quantizer  setting  that  results  in  the  bit  rate  closest  to  the  one  desired  and  then  not 
changing  the  setting  for  the  duration  of  the  sequence.  In  each  case,  the  target  bit  rate  was 
80  kbps.  The  results  indicate  that  the  frame-based  rate  controller  maintains  the  local 
average  closely  and  also  smoothes  the  bit  rate  somewhat  as  measured  by  each  sequence's 
variance.  As  presented  in  the  next  chapter,  smoothing  the  bit  rate  increases  multiplexer 
efficiency  and  reduces  bandwidth  requirements.  The  drawback  of  rate  control  is  a  slight 
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variation  in  frame-to-frame  quality  relative  to  open-loop  control  as  shown  in  Figure 
IV. 27.  The  statistics  for  each  sequence  are  listed  in  Table  rv.9.  A  variation  of  this 
approach  was  examined  to  increase  the  window  used  to  predict  the  current  deviation  from 
just  one  frame  as  indicated  in  Eq.  IV-28  to  m  frames  to  reduce  bit  variations.  Offline 
coders  look  back  m  frames  to  calculate  the  deviation  [81],  but  increasing  the  search 
window  as  in 


AB  ,    =mB  -Y 

^^  inter         '"-^         /  > 
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7=1 
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MB, 
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actually  resulted  in  looser  tracking  in  the  sequences  examined. 
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Figure  IV.26:  Bit  Rate  Traces  for  a)  Controlled  and  b)  Uncontrolled  Video 

Sequences. 


122 


Changing  the  quantizer  only  at  the  beginning  of  each  frame  may  provide 
insufficient  granularity  to  adequately  suppress  deviations  from  the  desired  bit  rate.  In  this 
case,  a  more  desirable  approach  is  to  examine  the  quantizer  vector  each  macroblock  and 
make  changes  as  required  to  control  the  target  bit  distribution  among  the  macroblocks. 
However,  this  approach  is  more  complex  than  frame-based  control  and  may  cause  quality 
variations  throughout  the  frame  during  high  activity  periods.  One  simple  scheme  is  to 
distribute  the  average  bit  allocation  for  each  frame  evenly  among  all  the  selected 
macroblocks  in  a  similar  manner  to  the  Telenor  rate  control  scheme  [55].  Given  that 
N^g   macroblocks  are  selected  in  the  current  frame  and  an  average  bit  allocation  of  B 

bits  is  used,  each  macroblock  receives  BJNf^g   bits. 

Controlling  bit  rate  at  the  macroblock  level  is  performed  as  follows.  At  call  setup, 
average  bit  allocation  per  frame  is  set  to 


n 

■n  _  ^target 

/ 


(IV.29) 


where  7?,^gg,  is  the  channel  bit  rate  and/ is  the  frame  rate.  For  the  first  macroblock  of  the 

new  frame  /,  we  calculate  the  expected  deviation  in  the  bits  allocated  to  the  current  frame 
if  the  quantizer  setting  from  the  last  frame  is  not  changed  as  above  and  apportion  this 
deviation  over  the  number  of  macroblocks  selected.  This  value  is  used  to  determine  the 
change  required  in  the  quantizer  setting  for  the  first  macroblock: 
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Figure  IV.27:  pSNR  Variation  for  a)  Controlled  and  b)  Uncontrolled  Video 

Sequences. 


Parameter 


With  Rate  Control      Without  Rate  Control 


Mean  Bit  Rate  (bpf) 

Bit  Rate  STD  (bpf) 

Mean  pSNR  (dB) 

pSNR  STD  (dB) 


7998 
942 

29.83 
1.92 


7454 
1362 

29.51 
1.74 


Table  IV.9:  Rate  Controlled  and  Uncontrolled  Sequence  Statistics. 
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For  each  remaining  macroblock;,^  =  2  to  N^4B,  we  calculate  the  deviation  between  the 
bits  allocated  so  far  within  the  frame  and  the  target  linear  distribution.  Assuming  that  the 
number  of  bits  allocated  so  far  within  the  current  frame  is  Bq.i,  and  given  that  the  target 
bit  allocation  per  macroblock  indicated  by  Eq.  (IV.SO),  the  deviation  at  macroblocky  is: 

^B,...=^^^-B,._,.  (IV.31) 

'^  MB, 

This  deviation  is  then  used  to  set  the  quantizer  parameter  for  the  current  macroblock: 


AC,.  = 


AS  . 

^^  intra 


(IV.32) 


One  possible  objection  to  rate  control  at  the  macroblock  level  using  the  scheme 
above  is  that  the  linear  bit  allocation  across  the  selected  macroblocks  takes  into  account 
neither  the  level  of  activity  within  each  macroblock  nor  the  perceptual  importance  of 
individual  macroblocks.  Therefore,  the  linear  approach  can  be  generalized  by 
introducing  a  weighting  factor  W,  for  each  macroblock  that  represents  the  relative 
proportion  of  bit  allocation  to  be  assigned  to  that  macroblock: 

B,j=W.B,.  (rV.33) 

The  only  constraint  placed  on  W,  is  that  all  weights  sum  to  1  to  achieve 

The  linear  assignment  scheme,  with  W,  =  1/  N f^^  obviously  meets  this  condition.  Two 

approaches  provide  a  means  to  tailor  bit  activity  to  macroblock  activity  level.  First, 
macroblock  selection  rate  provides  a  heuristic  indication  of  motion  within  the  current 
scene.  Given  the  set  of  macroblocks  selected  for  the  current  frame,  each  macroblock' s 
past  selection  history  can  be  used  to  determine  a  selection  probability  pj  relative  to  the 
current  set.  Such  a  selection  probability  provides  a  convenient  measure  of  motion. 
Those  blocks  that  are  selected  more  often  tend  to  lie  in  regions  of  greater  motion. 
Therefore,  an  appropriate  weighting  factor  that  emphasizes  regions  of  greater  motion  is  to 
set 
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However,  the  coder  must  refresh  selection  counts  after  every  scene  change  to  avoid 
biasing  the  motion  detection.  Another  approach  is  to  weight  the  bit  allocation  by  the 
variance  exhibited  by  each  macroblock,  thereby  allocating  more  bits  to  macroblocks  with 
higher  variance.  A  similar  approach  is  followed  in  [77].  Using  the  rate  distortion 
allocation  scheme  outlined  above,  a  weighting  factor  based  on  variance  is 

^         5,+log,^  ^ 

W.  =^^  = ^  =  i  +  _iog  _i-,  (IV.36) 

'       5,  B,  5,.     ^'  p 

where  B,  is  the  current  frame  bit  allocation,  Bij  is  the  allocation  for  the  jth  selected 

macroblock,  and  o  j  is  the  variance  of  the  coefficients  in  theyth  macroblock.  The  only 

drawbacks  to  this  scheme  are  that  weights  may  be  negative  and  macroblock  variance 
must  be  tracked,  which  increases  computational  overhead. 

Continuing  this  approach  with  static  video  produces  interesting  results.  As  shown 
in  Figure  IV. 28,  the  operational  rate-distortion  curve  is  relatively  flat  over  a  wide  range 
of  bit  rates.  Since  the  coder's  operational  range  falls  into  this  region,  rate  control  as 
described  above  is  not  possible  since  all  of  the  quantizer  states  produce  the  same  level  of 
quality.  However,  rate  control  is  not  a  distinct  requirement  for  static  sequences.  Since 
macroblocks  are  only  transmitted  due  to  aging,  bit  rates  for  static  sequences  are 
considerably  less  than  those  observed  in  motion  video  sequences.  Accordingly,  open 
loop  rate  control  is  adequate  for  static  sequences.  The  quantizers  are  preset  for  static 
sequences  to  the  quantizer  triplet  that  yields  the  lowest  bit  rate  in  the  flat  distortion  region 
and  fixed  for  the  duration  of  the  sequence. 
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Figure  IV.28;  Operational  Rate-Distortion  Curve  for  Static  Sequences. 

The  clear  implication  of  rate  control  is  that  any  change  in  the  quantizer  setting 
must  be  communicated  to  the  decoder.  Although  the  operational  control  curve  shown  in 
Figure  IV. 25  reduces  the  amount  of  data  used  to  describe  each  quantizer  state, 
transmitting  the  quantizer  setting  consumes  bandwidth,  and  update  frequency  should  be 
minimized.  Therefore,  at  a  minimum,  the  current  quantizer  vector  must  be  transmitted 
with  the  frame  header  using  frame-based  rate  control  and  with  each  macroblock  using 
macroblock-based  rate  control.  In  either  case,  using  a  VLC  code  to  communicate  only 
the  change  in  quantizer  setting,  as  in  differential  pulse  coding,  can  further  reduce 
overhead.  However,  the  minimal  approach  directly  conflicts  with  the  need  for  robust 
coding.  If  the  frame  header  is  damaged,  the  quantizer  settings  for  that  frame  are  lost. 
Differential  coding  creates  a  liability  unless  some  facility  is  made  for  refreshing  the 
quantizer  state  after  any  interruption  due  to  lost  cells.  To  ensure  that  each  GOB  is 
independently  decodable  for  robustness,  the  following  compromises  are  possible.  For 
frame-based  rate  control,  the  quantizer  setting,  in  the  form  of  the  lookup  table  index,  is 
included  in  every  GOB  header.  For  macroblock-based  rate  control,  the  quantizer  setting 
is  coded  differentially  between  macroblocks  within  the  GOB  and  refreshed  every  GOB. 
Differential  coding  within  the  macroblock  poses  no  liability  since  a  dropped  cell 
interrupts  decoding  until  the  decoder  resynchronizes  with  the  next  GOB  header. 
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This  chapter  introduced  a  new  layered  coder  design  motivated  by  the  need  to 
provide  a  flexible  video  delivery  scheme  for  greater  robustness  over  heterogeneous 
networks.  Attention  was  focused  on  those  elements  required  to  promote  the  effectiveness 
of  layered  coding.  In  general,  the  coder  uses  the  fast  Haar  transform  to  decompose 
selected  macroblocks  into  subbands,  and  then  subbands  are  allocated  to  layers  based  on 
their  relative  perceptual  importance.    Specifically,  a  generalized  layering  scheme  was 
devised  for  motion  video  that  allows  creation  of  an  arbitrary  layering  scheme  as  a 
function  of  video  content  as  evidenced  by  subband  variance.  However,  a  common 
layering  scheme  for  motion  video  and  static  presentation  slides  is  impractical  since  each 
attaches  a  different  perceptual  relevance  to  the  various  subbands.  Therefore,  different 
layering  schemes  are  employed  for  each  type  of  video  content;  the  coder  picks  the 
appropriate  scheme  dynamically  within  the  video  sequence. 

A  final  issue  examined  was  that  of  rate  control  for  the  layered  video  sequence. 
Since  subbands  are  essentially  layered  by  common  variance,  each  layer  employs  a 
different  quantization  scheme.  Rate  control  via  traditional  rate-distortion  techniques  is 
complicated  by  the  increased  dimensionality  of  the  layered  coder's  rate-distortion  surface 
and  the  possible  inter-dependence  among  quantizers.  Rate  control  is  simplified  by 
selecting  a  suboptimal  set  of  quantizer  vectors,  where  each  vector  consists  of  step  size  for 
each  quantizer,  thereby  effectively  reducing  the  operational  rate-distortion  curve  to  a  1-D 
relationship.  Rate  control,  either  at  the  frame  level  or  macroblock  level,  is  implemented 
via  a  simple  table  lookup. 
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V.        TRAFFIC  SMOOTHING 

The  previous  chapter  presented  a  new  scheme  for  preparing  a  video  sequence  for 
transmission  over  the  network  by  coding  the  sequence  as  a  hierarchical  series  of  layers. 
The  next  chapter  exploits  the  relative  perceptual  importance  of  each  layer  through 
priority-based  scheduling.  However,  the  manner  in  which  the  layers  are  transmitted  to 
the  network,  i.e.  the  statistical  characteristics  of  each  cell  flow,  plays  a  role  in 
determining  the  resources  each  switch  must  commit  to  the  sender  to  guarantee  that 
sender's  required  QoS.  In  general,  the  more  random  the  cell  flow,  the  more  resources, 
such  as  bandwidth,  must  be  committed.  Consequently,  by  manipulating  the  statistical 
characteristics  of  each  traffic  flow  prior  to  the  network,  the  network's  capacity  for 
carrying  traffic  is  enhanced,  which  is  particularly  desirable  for  low-bit-rate  networks. 

This  chapter  examines  the  concept  of  traffic  smoothing  for  layered  video  traffic  as 
a  means  for  increasing  transmission  robustness  by  increasing  queuing  efficiency.  The 
chapter  starts  by  discussing  the  concept  and  application  of  traffic  smoothing.  Next,  the 
psuedo-histogram  traffic  model  proposed  by  Skelly  et  al.  for  VBR  video  is  presented 
[14].  The  psuedo-histogram  has  the  advantage  of  capturing  the  effect  of  frame-by-frame 
smoothing  on  queue  behavior.  Details  on  determining  model  parameters  and  analytical 
techniques  for  DIDIMK  queues  are  presented  including  a  simple  technique  for  rate- 
controlled  video.  Finally,  an  integrated  scheme  is  proposed  for  traffic  smoothing  of 
layered  video  traffic  at  various  time  scales:  frame  level,  layer  level,  and  cell  level.  The 
issue  of  where  to  apply  traffic  smoothing  for  the  single  VCC  and  multiple  VCC  cases  is 
examined  along  with  the  issue  of  mitigating  delay  added  by  frame-by-frame  smoothing. 

A.         INTRODUCTION 

One  of  the  functions  of  ATM  traffic  management  is  call  acceptance,  which 
ensures  that  sufficient  network  resources  exist  prior  to  accepting  a  new  connection  with 
specified  QoS  requirements.  The  requisite  resource  allocation  as  a  function  of  the 
required  QoS -depends  on  statistical  properties  of  the  connection's  traffic  flow.  The 
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requisite  allocation  may  also  depend  on  the  properties  of  other  connections  currently 
within  the  network.  Each  new  connection  characterizes  its  anticipated  traffic  properties 
via  a  set  of  descriptors  that  depend  on  the  type  of  service  required  [28].  Possible  traffic 
descriptors  include  peak  cell  rate  (PCR),  a  sustainable  cell  rate  (SCR),  and  the  maximum 
burst  size  (MBS).  The  network  layer  then  uses  these  traffic  descriptors  and  the  current 
network  state  to  determine  whether  to  admit  the  call.  If  the  call  is  admitted,  a  traffic 
contract  is  formed  between  the  connection  and  the  network.  The  connection  agrees  to 
abide  by  the  traffic  descriptors  and  the  network  agrees  to  allocate  resources  such  that  the 
connection's  QoS  is  maintained. 

Assuming  that  the  VCC  traverses  sequential  queues,  QoS  is  guaranteed  by 
ensuring  that  sufficient  channel  allocation  exists  at  each  queue  such  that  the  QoS 
parameters  are  maintained.  Focusing  on  an  individual  queue,  the  required  channel 
allocation  depends  on  the  arrival  process,  the  QoS  required,  and  the  service  process.  For 
ATM  networks,  service  is  deterministic.  However,  the  service  rate  depends  on  the 
required  QoS  and  the  arrival  process.  For  a  given  QoS  and  a  given  arrival  process,  the 
goal  is  to  minimize  the  service  rate  required. 

Since  QoS  is  usually  fixed  for  each  particular  traffic  type,  the  arrival  process 
weighs  heavily  in  the  channel  allocation.  The  traffic  flow  within  each  connection  may  be 
viewed  as  a  random  process.  In  general,  the  channel  allocation  to  that  traffic  flow 
depends  on  the  relative  uncertainty  or  random  variation  in  its  arrival  process  at  a 
particular  queue.  In  particular,  the  greater  the  uncertainty  in  a  traffic  source's  arrival 
process,  the  greater  the  bandwidth  required  to  meet  the  desired  QoS.  For  example,  CBR 
traffic  is  completely  characterized  by  its  peak  cell  rate  alone.  By  definition,  the 
instantaneous  arrival  rate  for  VBR  traffic  is  time  varying  although  the  average  rate  is 
fixed'^.  A  simple  method  for  characterizing  the  variation  in  the  arrival  rate  is  the  ratio  of 
PCR  to  average  cell  rate  [82].  This  ratio  represents  the  burstiness  of  the  source;  a  higher 
ratio  denotes  a  burstier  source.  For  a  CBR  source,  this  ratio  is  one.  Alternately,  the 


'^  Otherwise  an  ATM  network  would  not  be  able  to  ensure  QoS  for  the  duration  of  the  connection. 
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burstiness  of  a  VBR  source  can  also  be  expressed  in  terms  of  the  variance  of  cell 
interarrival  times  [82]. 

The  problem  of  bandwidth  allocation  for  a  bursty  source  may  be  viewed  from  the 
perspective  of  a  deterministic  ATM  queue.  A  connection  is  guaranteed  to  lose  no  cells  if 
the  service  rate  exceeds  the  arrival  rate.  With  a  bursty  source,  selecting  the  service  rate 
equal  to  source's  PCR  ensures  that  no  cells  are  lost.  However,  the  channel  is 
underutilized  with  this  allocation.  Selecting  the  service  rate  equal  to  the  average  cell  rate 
fully  utilizes  the  channel  but  leads  to  a  large  amount  of  cell  loss.  Given  an  acceptable 
CLR,  the  appropriate  service  rate  lies  between  the  PCR  and  the  average  cell  rate,  which 
implies  that  a  certain  amount  of  underutilization  must  be  tolerated  to  achieve  the  desired 
QoS.  Of  course,  this  exact  characteristic  provides  the  basis  for  statistical  multiplexing 
since  the  aggregate  multiplexed  source  is  considerably  less  bursty  than  each  individual 
source. 

Given  that  uncertainty  in  the  arrival  process  increases  bandwidth  requirements, 
altering  a  connection's  traffic  characteristics  through  traffic  shaping  is  desirable  to 
increase  the  number  of  connections  that  may  be  serviced  with  a  given  amount  of 
bandwidth.  Alternately,  traffic  smoothing  increases  robustness  during  periods  of 
congestion  since  leveling  out  bursts  tends  to  reduce  the  probability  of  buffer  overflows. 
Both  considerations  are  especially  important  given  the  low  bandwidth  VTC  scenario 
presented  here.  Traffic  shaping  may  be  further  differentiated  into  the  functions  of  traffic 
smoothing  and  traffic  policing.  Traffic  smoothing  attempts  to  reduce  or  control 
burstiness  either  at  the  application  level  or  at  some  point  prior  to  entry  into  the  network. 
Traffic  policing  monitors  a  connection's  traffic  parameters  and  takes  action  to  correct 
deviations.  For  example,  Usage  Parameter  Control  (UPC)  in  ATM  monitors  each 
connection  to  ensure  that  its  traffic  conforms  to  the  traffic  contract  [18] [28].  Non- 
compliant  cells  are  tagged  and  may  be  dropped  later  in  the  network  to  avoid  impacting 
the  QoS  guaranteed  to  other  connections.  The  two  functions  are  not  totally  unrelated; 
controlling  burstiness,  perhaps  at  the  application  level,  may  be  viewed  as  a  form  of  self- 
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imposed  traffic  policing.  Here,  attention  is  focused  only  on  the  application  of  traffic 
smoothing  on  video  traffic. 

The  first  logical  place  to  implement  traffic  smoothing  is  at  the  application  level 
through  rate  control.  Rate  control  as  presented  in  the  last  chapter  represents  a  type  of 
self-imposed  traffic  policing;  the  rate  controller  attempts  to  maintain  some  traffic  statistic 
at  a  fixed  level  through  control  over  the  quantizer  setting.  However,  rate  control  provides 
an  obvious  mechanism  for  traffic  smoothing.  Forcing  transmitted  video  to  a  constant  bit 
rate  completely  removes  the  burstiness  inherent  in  video  traffic,  but  at  the  cost  of 
potentially  wide  variations  in  quality  from  frame  to  frame.  A  less  severe  tradeoff  is  to 
settle  for  a  constant  mean  bit  rate  which  is  the  approach  taken  in  Figure  IV. 26.  In  this 
case,  quality  variations  between  successive  frames  are  less  noticeable,  and  the  level  of 
burstiness  is  decreased  as  indicated  by  the  drop  in  bit  rate  variance  (see  Table  rv.9). 
Before  rate  control,  the  burstiness  factor  is  1.41;  after  imposition  of  rate  control,  the 
burstiness  factor  drops  to  1.21.  Of  course,  controlling  only  the  mean  bit  rate  does  not 
guarantee  any  particular  degree  of  smoothness.  With  proper  design,  a  rate  control 
scheme  should  be  able  to  achieve  an  arbitrary  level  of  smoothness  that  is  bound  only  by 
the  permissible  coding  delay. 

A  more  general  method  for  smoothing  a  traffic  flow  prior  to  entry  into  the 
network  is  the  leaky  bucket  scheme  proposed  for  network  access  control  [83]  [84]. 
Access  control  ensures  that  a  traffic  source  does  not  exceed  its  traffic  parameters  agreed 
to  as  part  of  the  traffic  contract.  The  scheme  is  illustrated  in  Figure  V.  1 .  The  basic  idea 
is  that  the  leaky  bucket  mechanism  controls  access  to  the  network.  ATM  cells  arriving  at 
the  leaky  bucket  must  obtain  a  token  from  a  token  pool  to  enter  the  network.  Tokens  are 
generated  at  a  constant  rate  r  and  placed  in  the  token  pool.  Additionally,  there  is  a 
maximum  limit  on  the  number  of  tokens  in  the  token  pool  at  any  time,  and  tokens 
arriving  after  the  token  pool  is  full  are  discarded.  The  token  pool  is  sized  to  control  the 
maximum  burst  length  from  the  source,  i.e.,  the  maximum  number  of  cells  that  can  be 
transmitted  back-to-back.  Restricting  the  number  of  tokens  controls  the  burstiness  of  the 
source  while  the  token  rate  dictates  the  average  cell  rate.  If  a  cell  arrives  and  a  token  is 
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not  available,  three  courses  of  action  are  available.  The  cell  could  be  discarded;  the  cell 
could  be  buffered  until  a  token  becomes  available;  or  the  cell  could  be  tagged  as  non- 
compliant  and  transmitted.  The  cumulative  affect  of  buffering  and  manipulating  the 
token  rate  allows  considerable  flexibility  in  altering  traffic  statistics.  However,  buffering 
introduces  delays  in  the  forward  transmission  path,  and  the  gain  offered  by  smoothing 
must  be  weighed  against  the  added  delay. 


Arriving  cells     1      ^~X  Departing  cells 


X     *-        A      J *'x 


token  buffer 
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r  tokens/sec,  buffer  not  full 
Figure  V.l:  Leaky  Bucket  Access  Mechanism. 

While  originally  conceived  as  an  access  control  mechanism,  the  leaky  bucket 
scheme  controls  by  smoothing  the  traffic  flow.  However,  smoothing  is  performed  for  the 
purpose  of  ensuring  compliance  with  the  traffic  contract.  The  approach  may  be 
generalized  for  smoothing  at  other  points  prior  to  network  entry,  such  as  at  the 
application  level  prior  to  the  AAL  or  within  the  AAL  prior  to  the  ATM  layer.  In  either 
case,  tokens  are  used  to  permit  transfer  of  PDUs  instead  of  ATM  cells.  This  offers 
another  avenue  for  smoothing  video  traffic  prior  to  network  entry.  For  example,  a  CBR 
type  smoothing  can  be  implemented  by  setting  the  token  rate  r  proportional  to  the 
channel  rate  and  setting  the  token  pool  size  to  one.  Then,  arriving  PDUs  are  buffered  and 
transmitted  to  the  next  lower  layer  at  the  token  rate,  maximizing  smoothness  but 
potentially  increasing  the  transmission  delay. 

Given  the  impact  of  traffic  statistics  on  queuing  efficiency,  characterizing  VBR 
video  traffic  sources  via  stochastic  models  plays  an  important  role  in  network 
performance  analysis.  In  particular,  traffic  models  provide  a  powerful  tool  for  analyzing 
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the  impact  of  the  arrival  process  on  queue  behavior  through  either  simulations  or 
analytical  analysis.  For  example,  traffic  models  can  provide  insight  into  determining 
appropriate  tradeoffs  between  buffer  depth  and  service  rate  to  achieve  a  desired  QoS.  For 
a  traffic  model  to  be  useful,  the  model  should  perform  two  functions.  First,  the  model 
must  accurately  represent  traffic  statistics,  namely  the  first  and  second  moments  and  the 
covariance  function.  Second,  to  evaluate  QoS  metrics  such  as  cell  delay  and  cell  loss  and 
to  validate  simulation  results,  the  traffic  model  should  extend  to  some  form  of  analytical 
queuing  analysis.  Meeting  both  of  these  goals  is  a  non-trivial  task. 

B.         VIDEO  TRAFFIC  MODELING 

This  section  presents  three  VBR  video  traffic  models  as  background  for  traffic 
simulations  conducted  in  later  sections  and  to  motivate,  in  part,  the  smoothing 
mechanism  presented  in  the  next  section.  The  autoregressive  models  proposed  by 
Maglaris  et  al.  [86]  and  Sen  et  al.  [88]  are  interrelated  and  have  been  used  to  model  VTC 
video  traffic  [27].  The  histogram-based  video  traffic  model  proposed  by  Skelly  et  al.  [14] 
is  notable  in  that  it  captures  the  effect  of  smoothing  video  traffic  on  a  frame  by  frame 
basis  and  provides  particularly  versatile  queuing  analysis  techniques. 

Modeling  VBR  traffic  requires  capturing  the  interdependence  between  coder 
design  and  video  activity  level  that  influence  the  video  stream's  arrival  process. 
Important  factors  with  regard  to  the  coder  are  the  compression  scheme  employed, 
particularly  in  the  distribution  of  I-  and  P-frames,  and  the  presence  of  rate  control.  Video 
activity  influences  the  compression  gain  through  the  level  of  scene  activity  or  motion  and 
the  periodicity  of  scene  changes.  Video  traffic  models  attempt  to  accurately  capture  the 
first  and  second  moment  statistics  of  the  traffic  source  along  with  its  covariance  function. 
A  useful  traffic  model  also  incorporates  queuing  analysis  techniques  that  allow 
calculation  of  QoS  metrics,  such  as  cell  delay  and  cell  loss  rate,  to  validate  simulation 
results.  Another  desirable  trait  is  low  computational  complexity. 
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1.  Autoregressive  Models 

'  A  representative  video  trace,  in  bits/pixel,  is  shown  in  Figure  V.2  for  a  rate- 
controlled  "talking  head"  scene  typically  found  in  VTC.  Such  sequences  usually  are 
characterized  by  a  roughly  Gaussian  shaped  bit  rate  histogram  and  an  exponentially 
decaying  autocorrelation  function.  On  the  strength  of  these  observations,  VBR  traffic 
models  based  on  a  first  order  autoregressive  processes  have  been  proposed  by  Maglaris  et 
al.  [86]  and  Heyman  et  al.  [87].  Using  a  first  order  autoregressive  model,  the  variation  in 
bit  rate  is  expressed  as 

A{n)=aA{n-i)+bw{n)  (V.l) 

where  w(n)  is  Gaussian  white  noise  with  unit  variance  but  a  non-zero  mean.  The 
parameters  in  Eq.  (V.l)  are  determined  using  the  first  and  second-order  statistics 
measured  from  the  video  sequence  along  with  the  estimated  autocorrelation  function. 
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Figure  V.2:  Video  Trace  for  a  Low  Activity  Sequence. 

Although  a  first  order  autoregressive  process  captures  the  effect  of  bit  rate 
variation,  these  models  provide  little  insight  into  queuing  behavior.  Sen  et  al.  [88]  has 
proposed  a  model  for  A''  multiplexed  video  sources  that  can  be  applied  in  queuing 
analysis.  The  model  represents  the  aggregate  video  sequence  as  the  output  of  M 
multiplexed  identical,  two-state  Markov  chains,  or  minisources,  where  M  »  N .  Each 
minisource  alternates  between  an  off-state  and  an  active  state  as  shown  in  Figure  V.3. 
When  multiplexed,  the  minisources  yield  an  equivalent  (M  +  l)-state  Markov  chain 
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wherein  each  state  transmits  at  a  fixed  multiple  of /?  cells/second.  Using  20  or  more 
minisources  per  video  source  reduces  the  affect  of  quantization.  The  model's  parameters, 
a,  (3,  and  R,  are  determined  from  the  first  and  second  moments  as  well  as  the 
autocorrelation  function  for  a  single  video  source;  all  video  sources  are  assumed  to  have 
the  same  statistical  characteristics.  Given  the  model  parameters,  cell  loss  probability  and 
buffer  occupancy  statistics  are  determined  through  fluid-flow  analysis  [27].  A 
shortcoming  of  the  minisource  model  is  the  inability  to  model  an  arbitrary  bit  rate 
histogram  since  bit  rate  follows  a  binomial  distribution  [14]. 

While  both  of  the  above  models  do  a  good  job  of  characterizing  bit  rate  variations 
within  a  scene,  no  attempt  is  made  to  capture  the  effect  of  scene  changes.  Given  the 
behavior  of  motion-compensated  video  coders,  aperiodic  bit  rate  peaks  are  expected  due 
to  scene  changes  since,  following  a  scene  change,  most  macroblocks  are  intracoded  due 
to  a  lack  of  a  suitable  reference  in  the  last  frame '^. 
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Figure  V.3:  Minisource  Video  Model. 

2.  Histogram-based  Traffic  Modeling 

The  histogram-based  video  traffic  model  proposed  by  Skelly  et  al.  [14]  represents 
an  intermediate  approach  between  autoregressive  modeling  and  self-similar  traffic 
models.  The  premise  of  the  model  is  very  simple:  quantize  the  arrival  rates  and  then 
approximate  the  video  sequence  by  its  quantized  version.  Motivation  for  the  model  stems 
from  the  need  to  smooth  the  video  traffic  flow.  Dixit  and  Skelly,  in  an  earlier  work  [89], 


'^  For  an  analogous  reason,  periodic  bit  rate  peaks  occur  in  MPEG-encoded  sequences  due  to  the  GOP 
structure. 
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demonstrated  the  relation  between  traffic  smoothing  and  ATM  multiplexer  performance. 
Given  a  buffered,  compressed  video  frame,  the  resulting  ATM  cells  could  be  transmitted 
in  several  manners.  For  example,  the  cells  could  be  transmitted  at  the  peak  available 
channel  rate  until  the  buffer  is  emptied.  The  resulting  traffic  is  very  bursty  since  the 
video  coder  transmits  at  a  high  rate  for  a  brief  period  and  then  falls  idle  for  the  rest  of  the 
frame.  The  problem  with  this  approach  is  that  when  several  sources  are  multiplexed,  any 
correlation  between  the  burst  periods  tends  to  increase  cell  loss  dramatically. 

Dixit  and  Skelly  [89]  instead  proposed  to  transmit  the  buffered  cells  randomly 
over  the  entire  frame  interval  as  a  Poisson  stream  to  the  ATM  multiplexer.  Skelly  et  al. 
[14]  combined  this  smoothing  scheme  with  the  quantized  video  traffic  model  described 
above.  Each  quantized  level  represents  a  single  frame,  and  cells  from  each  quantized 
level  are  transmitted  as  a  Poisson  stream  to  the  multiplexer  over  one  frame  interval. 
Assuming  that  transitions  between  levels  may  occur  every  frame  and  that  the  transitions 
are  memoryless,  the  resulting  traffic  model  is  a  discrete-time  multi-state  Markov- 
modulated  Poisson  process  (MMPP)  as  shown  in  Figure  V.4  (some  transitions  are 
removed  for  clarity).  The  Markov  chain  serves  to  modulate  the  underlying  Poisson- 
smoothed  arrival  process,  where  each  state  /  corresponds  to  a  Poisson  process  whose 
arrival  rate  X,  matches  the  size  of  the  compressed  frame  in  bits  for  that  state.  Shroff  [15] 
later  expanded  the  MMPP  model  into  the  generalized  histogram  model,  also  known  as  a 
Markov-modulated  rate  process  (MMRP),  which  incorporates  arrival  processes  other  than 
Poisson  [15].  In  particular.  Shroff  demonstrated  that  the  maximum  queuing  efficiency  in 
ATM  multiplexers  is  achieved  by  smoothing  deterministically,  i.e.,  by  transmitting  cells 
at  equal  intervals  throughout  the  frame  interval.  The  result  resembles  a  modulated  CBR 
process  with  a  new  rate  every  frame. 
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Figure  V.4:  Markov-modulated  Poisson  Process  (MMPP). 

3.         Determining  Model  Parameters 

The  histogram  model  parameters  consist  of  the  MMRP  state  probabilities,  the 
state  transition  probabilities,  and  the  state  arrival  rates  and  are  estimated  from  the  video 
sequence  in  the  following  manner.  The  video  sequence  is  uniformly  quantized  into  n 
bins,  where  each  bin  represents  a  single  state.  The  quantized  arrival  rates  A,,  represent  the 
arrival  rates  for  their  respective  states.  Next,  transition  probabilities  between  states  are 
measured  directly  from  the  quantized  sequence  yielding  the  state  transition  matrix  P.  The 
steady  state  distribution  is  given  by 

;i=[7i,    ■■■    n„],  (V.2) 

where  71,  is  the  steady-state  probabilities  for  state  /.  The  state  probabilities  can  be 
determined  by  solving  the  eigenequation: 

7l=P7l.  (V.3) 

Alternately,  K  is  the  eigenvector  of  P  whose  corresponding  eigenvalue  is  1  [491.  Since 
the  rate  of  the  modulating  process  is  much  slower  than  the  modulated  process,  an 
equivalent  continuous-time  Markov  process  is  determined  from  [27] 

M  =  f(P-I),  (V.4) 

where /is  the  frame  rate,  and  M  is  the  infinitesimal  generating  function  representing 
transition  rates  from  each  state. 

Once  the  model  parameters  have  been  determined,  one  check  of  the  model's 
fitness  is  to  compare  the  model's  first  and  second  moments  and  autocorrelation  function 
to  those  of  the  actual  sequence.  For  the  model,  the  mean  is  given  by: 
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E[X{n)]=J^7r,;i,  .  (V.5) 

The  autocorrelation  function  is  given  by  [27]: 

4;i(n)^(«+/)]=XI^-AH'^(«+0=^,W«)=^,>W«)=A]-       (V.6) 

'=1  j=\ 

Since  the  histogram  model  approximates  the  actual  histogram  of  the  given  video 
sequence,  the  model  is  able  to  support  a  wide  range  of  video  activity  and  compression 
schemes.  For  example,  while  the  MMRP  model  does  not  explicitly  model  scene  changes, 
the  peaks  in  bit  rate  resulting  from  scene  changes  are  implicitly  captured  in  the  higher 
states.  Skelly  et  al.  [14]  presented  results  from  10  second  JPEG  encoded  sequences  taken 
from  "Star  Wars".  Compared  to  the  original  sequences,  the  eight-bin  model  predicts  a 
slightly  higher  mean  bitrate  and  provides  a  good  match  for  the  autocorrelation  function 
over  a  range  of  four  seconds  (96  frames).  While  increasing  the  resolution  of  the 
histogram  did  not  dramatically  change  the  approximation,  employing  less  than  eight  bins 
resulted  in  a  poor  approximation.  With  rate-controlled  video  segments,  satisfactory 
results  have  been  reported  using  as  few  as  six  states  [90]. 

Given  the  histogram  parameters  for  a  single  source,  an  equivalent  histogram  for  A^^ 
homogenous  sources  may  be  obtained  through  A'^-  1  convolutions  [91]: 

TT^  =7r*7r*---*7r  .  (V.7) 

The  state  arrival  rates  are  given  by 

Af  =NX,  +{i-\)M,  bJi  =  X^-?i,,  i  =  l,2,...,2N-\.  (V.8) 

For  heterogeneous  sources,  the  process  is  slightly  more  difficult  and  the  equivalent 
histogram  must  be  resolved  one  source  at  a  time.  Given  two  non-equivalent  histograms, 
the  joint  histogram  may  be  written  as  a  two-dimensional  Markov  chain  with  A^^  states 
[27].  The  probability  for  state  (m,n)  is  given  by 


and  the  aggregate  arrival  rate  by 


K..=K+^n^  (V.IO) 
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where  the  indices  m,n  refer  to  the  corresponding  states  in  the  original  Markov  chains. 
The  result  can  be  converted  back  into  a  one-dimensional  histogram  by  renumbering  the 
states  in  order  of  increasing  arrival  rate.  Compared  to  the  homogeneous  case,  the  size  of 
the  aggregate  histogram  grows  much  more  rapidly  although  coalescing  states  or  deleting 
highly  improbable  states,  in  comparison  to  the  simulation  length,  can  possibly  reduce  the 
size. 

4.  Queuing  Analysis 

Cell  loss  analysis  proceeds  by  invoking  a  quasi-static  behavior  for  the  MMRP 
model  and  assuming  that  the  rate  of  the  modulating  process,  the  Markov  chain,  is  far 
slower  than  the  rate  of  the  modulated  process,  the  state  arrival  rate.  With  this 
assumption,  the  queue  is  expected  to  reach  equilibrium  rapidly  compared  to  the  time 
interval  between  frames,  and  each  state  may  be  treated  as  an  independent  source.  The 
probability  that  the  buffer  contains  n  cells  is  given  by  [27]: 

p[N  =  n]=f^p[N  =  n\A  =  X,];r.,  (V.ll) 

where  7i,  are  the  state  probabilities,  and  P[N  =  n\A,  =  A,  J  is  the  probability  that  the  buffer 

contains  n  cells  given  the  arrival  rate  ?i,.  From  Eq.  (V.l  1),  the  buffer  distribution  for  each 
individual  state  depends  on  the  arrival  process  to  the  buffer,  which  in  turn  depends  on  the 
smoothing  mechanism  and  the  type  of  service  granted.  Given  that  ATM  uses  fixed- 
length  cells,  service  is  usually  deterministic.  Although  the  original  histogram  model  used 
Poisson  smoothing.  Shroff  [15]  has  demonstrated  that  deterministic  smoothing  yields 
better  queuing  performance.  Therefore,  further  discussion  is  limited  to  only  DIDIIIK 
queuing  systems.  Equation  (V.l 2)  indicates  that  the  transition  rates  between  states,  and 
by  extension  the  shape  of  the  autocorrelation  function  as  given  by  Eq.  (V.6),  play  no  role 
in  determining  the  buffer  occupancy  distribution  as  would  be  expected  if  self-similarity  is 
a  significant  factor.  Indeed,  Skelly's  [14]  results  indicate  that  accurately  capturing  the 
autocorrelation  function  plays  a  greater  role  in  modeling  buffer  distributions  than  the 
actual  shape  of  the  autocorrelation  function. 
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Although  buffer  distribution  is  of  interest,  analyzing  cell  loss  is  more  important  in 
determining  appropriate  buffer  depths.  The  system  loss  probability,  assuming  that  states 
are  independent,  is  given  by  [27] 

where  E[X]  is  given  by  Eq.  (V.5),  tt,  are  the  state  probabilities,  Xi  are  the  arrival  rates,  and 
Pj^  are  the  loss  probabilities  for  that  state.  Equation  (V.12)  represents  the  aggregate  loss 

rate  as  the  sum  of  cells  lost  from  each  state  over  a  long  interval,  weighted  over  each  of  n 
states,  divided  by  the  expected  number  of  arrivals.  The  individual  loss  rates  in  Eq.  (V.12) 
depend  on  the  queuing  system  being  evaluated.  For  a  D/D/l/K  queuing  system,  assuming 
a  very  long  sojourn  time  T  for  each  state,  allows  a  simple  approximation  for  loss  rate 
[15].  If  the  arrival  rate  is  less  than  the  service  rate,  no  cells  will  be  lost  since  an  arriving 
cell  finds  the  server  idle  or  servicing  a  cell.  If  the  service  rate  is  less  than  the  arrival  rate, 
cells  not  serviced  during  the  sojourn  time  or  buffered  are  lost.  The  loss  probability  in  this 
case  is  given  by: 

T^-         AT 

,     1  X  <^'^^ 

=  1 ,     P=— , 

P  M 

where  T  is  the  sojourn  time,  X  is  the  arrival  rate,  and  |j.  is  the  service  rate.  Considering 
both  scenarios,  the  loss  rate  for  the  ith  state  for  determistic  arrivals  and  determinisitc 
service  is: 


Pl  = 


0,  P,=^<1, 

1  ^  (V.14) 

1 ,  p,>l. 

Pi 

Substituting  the  result  from  Eq.  (V.14)  for  each  state  into  Eq.  (V.12)  gives  the  system 

loss  probability. 

For  D/D/l/K  systems,  Eq.  (V.14)  indicates  the  counterintuitive  result  that  cell  loss 

probability  is  ijidependent  of  queue  size  K.  However,  cell  loss  behavior  demonstrates 
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two  distinct  patterns  dependent  on  buffer  size,  the  cell  region  and  the  burst  region,  as 
shown  in  Figure  V.5  [27].  In  the  cell  region,  cell  loss  drops  rapidly  with  buffer  size,  and 
cell  losses  are  confined  to  individual  cells.  This  region  is  modeled  well  by  Eq.  (V.12).  In 
the  burst  region,  cell  loss  drops  at  a  slower  but  exponential  rate  with  buffer  size;  cell 
losses  occur  in  bursts  in  this  region,  a  behavior  not  captured  by  the  histogram  model. 
Equation  (V.14)  indicates  both  regions  coalesce  into  a  constant  value  for  D/D/l/K 
systems.  However,  simulations  show  that  these  systems  lie  instead  in  the  burst  region 
[27]. 


Log  Pi 


Cell 
Region 


Burst 
Region 


Buffer  Size,  K 

Figure  V.5:  Cell  and  Burst  Regions  for  Cell  Loss. 

Shroff  offers  an  ad  hoc  technique  for  estimating  cell  loss  probability  using  MMRP 
models  by  incorporating  fluid  level  analysis  to  capture  behavior  in  the  burst  region  [15]. 
In  the  cell  region,  loss  is  calculated  using  Eq.  (V.12)  with  an  appropriate  expression  for 
Pu-  In  the  burst  region,  fluid  level  analysis  is  used  to  predict  the  exponential  relationship 
with  queue  size  in  the  form. 


P{x  >K)  =  Ae 


SK 


(V.15) 


where  5  is  dominant  eigenvalue  from  the  fluid  level  representation  of  the  system.  Using 
the  infinitesimal  generating  function  for  the  histogram  model,  5  is  the  least  negative 
eigenvalue  of  the  array  D'^M,  where  D  is  given  by: 

D  =  diag[?i^  - ^\.  (V.16) 

The  constant  A  in  Eq.  (V.  15)  is  determined  by  piecing  the  cell  region  and  burst  region 
curves  together  at  the  cutoff  point  Kq  where  both  curves  have  equal  slopes.  Then  the 
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constant  A  is  a  function  of  the  cutoff  buffer  size  and  the  cell  region  loss  probability  at  that 
buffer  size, 

A  =  P(x>/^o^,w/.„..xe"''"-  (V.17) 

Together,  Eq.  (V.12)  and  Eq.  (V.15)  provide  a  complete  description  of  the  cell  loss 
behavior  with  queue  size.  A  MMRP  system  with  deterministic  arrivals  represents  a 
special  case  since  the  queue  is  always  in  the  burst  region.  The  cell  loss  probability  is 
determined  by  correcting  Eq.  (V.  12)  directly  by  the  factor  e    . 

For  multiplexed  sources,  cell  loss  probability  is  determined  by  applying  the  above 
techniques  to  the  equivalent  histogram  resulting  from  numerical  convolution  of  the 
individual  histograms.  Shroff's  technique  extends  easily  in  the  case  of  multiplexed 
homogeneous  sources  but  becomes  more  difficult  with  heterogeneous  sources  [92]. 

5.         Application 

In  the  next  section  and  the  next  chapter,  a  MMRP  model  is  used  to  represent  a 
deterministically  smoothed  layered  video  traffic  source.  The  model  is  used  both  as  a 
traffic  source  in  OPNET  simulations  and  as  an  analytical  model  for  queuing  calculations. 
Model  parameters  were  derived  from  the  rate-controlled  sequence  shown  in  Chapter  FV. 
The  actual  parameters  are  given  in  Appendix  B. 

C.        SMOOTHING  LAYERED  VIDEO  TRAFFIC 

Traffic  smoothing  improves  multiplexer  performance;  the  implied  benefits  are  a 
degree  of  bandwidth  conservation,  which  permits  the  network  to  guarantee  QoS  for  a 
given  level  of  traffic  with  less  bandwidth.  This  is  particularly  desirable  for  the  low  bit 
rate  network  envisioned  in  Chapter  II.  While  smoothing  has  been  discussed  previously, 
coverage  has  focused  on  network-level  traffic  shaping  for  both  traffic  policing  and 
improving  multiplexer  performance.  In  this  section,  we  propose  a  new  smoothing 
scheme  targeting  layered  video  that  is  notable  in  two  ways.  First,  we  focus  on 
developing  a  practical  smoothing  mechanism  implemented  at  the  sender  prior  to  the 
ATM  layer.  The  goal  is  to  avoid  manipulating  traffic  streams  at  the  ATM  layer  since 
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maintaining  a  separation  between  network  and  client  layer  functionality  is  desirable  to 
preserve  network  interoperability.  Second,  the  smoothing  mechanism  covers  three  time 
scales:  frame  level,  layer  level,  and  cell  level.  The  former  is  considered  briefly  while  the 
latter  two  are  the  main  focus  of  this  section. 

Based  on  previous  discussion,  rate  control  provides  frame-level  smoothing  by 
limiting  variations  from  the  target  bit  rate.  This  type  of  smoothing  is  obtained  essentially 
as  a  byproduct  since  rate  control  is  a  necessary  component  to  ensuring  compliance  with 
the  traffic  contract  in  ATM  networks.  As  shown  in  Figure  IV. 26  and  using  the  values 
given  in  Table  IV. 9,  the  rate  control  mechanism  discussed  in  Chapter  FV  produces  an 
approximate  16%  decrease  in  burstiness. 

1.         Cell  Level  Traffic  Smoothing 

While  rate  control  smoothes  variations  in  bit  rate  over  multiple  frames,  a  more 
explicit  approach  is  to  smooth  at  the  cell  level  by  controlling  interarrival  times  to  the 
ATM  multiplexer.  As  discussed  in  the  last  section,  this  exact  concern  partially  motivated 
Skelly's  [14]  histogram  traffic  model.  Following  Skelly's  approach  of  smoothing 
individual  frames,  we  propose  an  analogous  smoothing  scheme  implemented  via  a  leaky 
bucket  type  mechanism.  The  basic  approach  is  shown  in  Figure  V.6.  Smoothing 
proceeds  by  modulating  the  arrival  rate  into  the  network  for  each  individual  frame.  Each 
compressed  frame  is  buffered  prior  to  transmission  into  the  network,  and  portions  of  the 
compressed  frame,  termed  transmission  units  for  now,  are  released  for  transmission 
whenever  a  token  is  available.  Tokens  are  generated  at  a  fixed  rate  r  and  only  a  single 
token  is  available  at  a  time.  The  combined  effect  is  to  deterministically  smooth  the  flow 
of  transmission  units  by  releasing  them  for  transmission  at  intervals  of  1/r  seconds.  The 
token  rate  r  is  evaluated  anew  each  frame  and  is  set  to  the  arrival  rate  for  the  current 
frame  as  measured  in  transmission  units  per  second.  In  this  manner,  the  token  rate  is 
assigned  a  value  sufficient  to  ensure  that  the  entire  frame  is  transmitted  during  a  single 
frame  interval.  For  example,  if  a  transmission  unit  consists  of  300  bits  and  the  current 
compressed  frame  size  is  6000  bits,  the  token  rate  must  be  set  to  20/ tokens  per  second, 
where/is  the  frame  rate.  Since  this  scheme  occurs  downstream  from  the  video  coder. 
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rate  contfiltered to prioritize transmission consistent 
with the perceptual importance of each layer. The two schemes mentioned above are 
implemented using the filtering algorithms shown in Figure VI.4. With bandwidth 
sharing, the intent is to give a higher priority to the more perceptually important layers 
only when those layers are not receiving their desired QoS. Otherwise, all layers are 
treated in an equal manner. This QoS-based prioritization is implemented by zeroing out 
the CLPR of lower priority layers whenever a higher priority layer is not receiving the 
requisite QoS as indicated by a CLPR of greater than one. With priority sharing, a lower 
priority cell only receives service if no higher priority cells are available for service 
within the queue. This is accomplished by zeroing out the CLPR of lower priority layers 
whenever the CLPR of higher priority cells is non-zero, which indicates that those layers 
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have cells available for service. The filtering algorithms shown in Figure VI.4 assume 
that each connection has three layers. 



if(CLPRL[/,0]> 1){ 
CostL\j, 1 ] = 0; 
CostL\j,2] = 0; 

} 

else \i{CLPRL\j,\] > 1){ 




if(CLPRL[/,0]>0){ 
Co5tL[/,l] = 0; 
CostL\j,2] = 0; 

} 

else if(CLP/?L[/,l] >0){ 


CostLlj,!] = 0; 




CostL\i,2] = 0; 


} 




} 



a. a. 

Figure VI.4: Cost Filtering per Layer for a) Bandwidth Sharing and b) Priority 

Sharing. 

After filtering, the slot is assigned to the cell from the layer with the highest 
CLPR. At this point, two options were explored. Earlier work with the BCLPR 
algorithm indicated that selecting cells deep within a queue has a deleterious effect on 
throughput [93]. Selecting cells without regard to queue position may lead to the 
situation in which cells on the verge of expiration are ignored to service a cell from a 
connection with a higher cost even though that cell is in no immediate danger of 
expiration. The STEBR algorithm [39] corrects this by comparing the cost of denial of 
service for each connection on a global basis. However, the filtering algorithms re- 
introduce this problem to a certain extent by bypassing cells from a lower priority layer to 
service cells from higher priority layers as needed. Arguably, this is intentional since 
without the higher priority layers the lower priority layers produce no benefit to the 
receiver, and a lower throughput is acceptable to ensure that the appropriate cells are 
delivered. The tradeoff between throughput and priority service is examined by 
implementing service deferral. The ToE of the cell selected for service during the current 
time slot is examined. If the ToE indicates that the cell is not due to expire during the 
next time slot, service is deferred and the cell closest to expiration from that connection is 
selected for service instead. Service deferral therefore reverts back to STE [99] within a 
connection whenever possible. 
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A complete summary of the algorithm is given in Figure VI.5. An OPNET model 
that implements STEER for layered video traffic is given in Appendix A. 



1 . Sort the queue in order of increasing ToE from the head of the queue. 

2. Scan the queue from head to tail. For each cell: 

a. Calculate the cell’s ToE. 

b. If the ToE is less than the service interval: 

i. Discard the cell. 

ii. Increment DS[/] and DSL\j]. 

3. Update CLPR, CLPRL, and A for each connection and layer using Eq. (VI. 1) through 
(VI.4). 

4. Assign a connection cost to each cell using Eq. (VI.5) and Eq. (VI.6). 

5. Assign each cell to a tentative time slot n = \ToE x J . 

6. Assume that after step 5, N time slots are allocated. 

7. For each time slot n from N down to 1 : 

a. For every cell i in that time slot from connection j, layer k: 

i. If Cost\j] > 0: 

1 . Increment Extra_Cells [j] . 

ii. Else: 

1. Set Cost[j] = Cell_Cost[i]. 

b. Find the largest Cost[/]. Assume the connection isy^- 

c. If Cost[i] > 0, there is at least one cell awaiting service. 

i. If Extra_Cells[/J = 0: 

1 . Set Cost{jx] = - 1 . 

ii. Else: 

1 . Decrement Extra_Cells[/^]. 

2. Reduce Cost\jx] by Aj. 

8. Connection jx is assigned the time slot. 

a. For each layer k with no cells enqueued, set CLPRL\jx,k] = 0. 

b. Filter the cost for each layer using Figure VI.4. 

c. Assume winning layer is kx'. 

i. With service deferral: 

1 . If \ToE X C„, j > 2 for the selected cell : 

a. Service the cell from jx with the lowest ToE. 

2. Else: 

a. Service the first cell from layer k. 

ii. Otherwise service the first cell from layer k. 



Figure VL5: Modified STEER Algorithm. 
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2 . 



GOB Dropping 



As discussed in Chapter III, correct decoding of the compressed video bit stream 
requires that the decoder stay in sync with the bit stream. Bit errors and dropped cells 
interrupt the decoding process and force the decoder to scan the bit stream until a 
distinctive codeword is found to reset the decoding process. This is part of the rationale 
for imposing a logical hierarchy on the bit stream (see Figure HI. 1 ). For the coder 
proposed here, the information required to start the decoding process includes the start of 
the next macroblock, the scene type, and the current quantizer setting. Other coders 
might require additional or different information’"^. Since repeating this information 
consumes bandwidth, a tradeoff is forced between minimizing this overhead and the 
distance, in macroblocks, between resynchronization points. Most coders, therefore, 
support resynchronization at the start of each GOB’^. The result is that, after a stream 
error, the decoder parses through the bit stream until a GOB header is recognized and 
restarts decoding at that point. The intervening data between the stream error and the 
GOB header is discarded, and the effect on the display is left up to the decoder. 

The effect of dropped cells on the decoder has strong implications for the layered 
scheduling algorithm proposed in the last section. As shown in Figure VI.6, a cell 
dropped from within a GOB corrupts the GOB. Any cells remaining in the GOB are 
unusable since their information payload will ultimately be discarded at the decoder. In 
this case, making scheduling decisions based on CLPR is suboptimal since CLPR no 
longer represents a valid indication of the impact of denying service on reconstructed 
visual quality at the recipient. Indeed, dropping the remaining cells in the corrupt GOB 
does not further degrade the quality of the reconstructed frame beyond that imposed by 
the original cell drop. However, the effect of dropping the unusable cells is not merely 
neutral. Removing these cells from contention increases the number of scheduling 
opportunities to cells that still have the potential to be successfully decoded. Therefore, 

A MPEG decoder would need the frame type (I, P, or B) for example [6). 

H.263 has a low bit rate mode that eschews GOB headers and resynchronizes only at frame headers [56]. 
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in a global sense, the overall effect on the reconstructed quality of all competing 
connections is positive especially if the released scheduling opportunities are biased 
toward the higher priority layers in each connection. Since the layered STEBR algorithm 
filters costs by layer, this is the expected outcome. 
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Figure VI.6: The Effect of Cell Discard on a GOB. 



To illustrate these points, consider the example shown in Figure VI.7. A 
scheduling slot k contains a layer 2 cell from connection A and layer 0 cells from 
connections B and C, respectively, with the connection costs shown. The layered STEBR 
mechanism grants the slot to that connection with the greatest overall cost and then filters 
by layer. Here, connection A would be granted the slot. Now, assume that the layer 2 
cell belongs to a broken GOB. Granting service to A will not improve the recipient’s 
quality, and denying service to connections B and C potentially corrupts two additional 
GOBs. Denying service to A, while appearing to degrade QoS to the connection, actually 
provides a global benefit since connection B receives an additional scheduling 
opportunity. 

A-2 B-0 C-0 

0,9] 0.85 0.83 

t t 

T, 



Figure VI.7: Competition Between Usable and Unusable Cells. 
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An analogous situation occurs with UBR connections carrying DP datagrams. If a 
cell from a datagram is discarded, the remaining cells are unusable. If the IP datagrams 
belong to a TCP connection, a single dropped cell forces the entire TCP segment to be 
retransmitted. As retransmissions reduce effective throughput, techniques such as partial 
packet discard respond to a dropped cell by dropping the remaining cells in the datagram. 
By removing these unusable cells from contention for scheduling, effective throughput is 
increased [18]. 

Based on the discussion above, we present a modification to the layered STEBR 
scheduling algorithm that implements partial GOB dropping. Partial GOB dropping 
consists of removing any cells remaining in a GOB following a dropped cell in that GOB. 
A similar approach proposed for high bandwidth MPEG-2 video traffic by Kuo and Ko 
[100] schedules slices for transmission only if sufficient bandwidth is available to 
transmit an entire slice without loss. The approach here is less stringent since scheduling 
assignments are made based on current queue occupancy, delay considerations do not 
allow determination of GOB length in real-time for low bit rate video traffic, and some 
partial benefit is derived by transmitting at least the beginning of the GOB. 

Since the video stream is layered, partial GOB dropping must take into account 
both dropped cells within each GOB plus the impact of GOB corruption in one layer on 
related GOBs within other layers. Obviously, the greatest impact occurs when a GOB 
from the base layer is corrupted. In that case, at least part of the information carried 
within the associated GOBs of lower priority layers is also rendered unusable. 

Corruption of a lower priority GOB does not appear to have the same consequence. 

Based on subjective and quantitative evaluations using the coder from Chapter IV, a 
tangible benefit is obtained by decoding and applying a lower priority enhancement 
regardless of whether higher priority enhancement layers are successfully decoded. 

Based on these observations, partial GOB dropping is implemented in the 
following manner. If a cell is discarded from a base layer GOB, all remaining cells in 
that GOB and all remaining cells in associated lower priority layer GOBs are discarded. 

If a cell is dropped from within an enhancement layer GOB, all remaining cells in that 
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layer’s GOB are discarded. The impact of a cell discard in each situation is illustrated in 
Figure VI. 8. 
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Figure VI.8: Discard Policy Following a Cell Loss from: a) Base Layer GOB or b) 

Enhancement Layer GOB. 

The base layer discard policy is actually somewhat severe since a cell loss from a 
base layer GOB does not always invalidate information in enhancement layer GOBs. 
Technically a loss from the base layer GOB only invalidates information in enhancement 
layers starting at the same spatial position, i.e., a macroblock, for decoding purposes. 

Any information prior to this point is still usable although coordinating the spatial 
relationship of cells in different layers is not a trivial task. One possible approach is to 
interleave cells from different layers in a manner that approximates the correct spatial 
dependence such that when a cell from the base layer is dropped, loss of usable 
information is minimized when dropping the remaining cells in the base and 
enhancement layers. This approach is shown in Figure VI.9, where cells from the layer 
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GOBs have been interleaved due to spatial dependence. Now a loss of a base layer cell 
results in smaller number of cell discards compared to Figure VI.8. With the current 
coder, this is not an issue since at low bit rates the enhancement layers are usually 
restricted to a single ATM cell in length. 
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Figure VI.9: Interleaving Layers Cells to Minimize Information Loss. 

GOB dropping is implemented in the following manner. For each connection, a 
flag is maintained for each layer, i.e., 3 flags per video source. The flag indicates the 
state of the current GOB in each layer, ‘RETAIN’ or ‘DROP’, and indicates whether the 
remaining cells in that GOB should be retained or dropped. Assuming that the current 
GOB has remained intact so far, a cell dropped due to expiration triggers a change in 
status from RETAIN to DROP. If the expired cell belongs to the base layer, the flags for 
the associated lower priority layer GOBs are also set to DROP. Each layer’s flag is reset 
to RETAIN at the start of a new GOB as indicated by either the SDU bit or a change in 
cell tags (see Figure II. 1 1 and Figure II. 13). 

At the start of each scheduling slot, the queue is scanned from head to tail as 
previously described. The scheduler performs different actions for each cell depending 
on the status of its parent GOB. If the GOB status is RETAIN, the cell’s ToE is 
calculated. If the cell has expired, the cell is dropped, and the GOB’s status is changed to 
DROP. Again, if the cell belonged to the base layer, the enhancement layers are set in a 
similar manner. If the GOB status is DROP, the cell is examined to determine if the cell 
contains a GOB header, which indicates the start of a new GOB. If it does and the cell 
has not expired, the GOB status is toggled back to GOOD. Otherwise, the cell is dropped 
regardless of its ToE. This algorithm is summarized in Figure VI. 10. 
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1 . Scan the queue from head to tail. 

2. For each cell from connection i, layer j: 


A. If status[/j] = RETAIN: 


a. 


Calculate ToE. 


b. 


If ToE < service time: 




i. Status[/J] = DROP. 

ii. Discard cell. 




iii. Ify = 0: 




1 , Status[/,it] = DROP V^' 0. 


B. If status [jJ] = DROP: 


a. 


Check for GOB header. 


b. 


If new GOB header: 




i. Calculate ToE. 

ii. If ToE < service time: 




1 . Discard cell. 




iii. Else: 




1. Status[/j] = RETAIN. 


c. 


Else: 




i. Discard cell. 



Figure VI. 10: Partial GOB Dropping Algorithm. 



D. RESULTS 

Performance of the layered STEBR algorithm was validated using OPNET. The 
scenario simulated was a network configured as shown in Figure VI. 1 1 with three layered 
video sources. Each layered source transmits at a mean bit rate of 80 kbps and is 
represented within the simulation using the MMRP traffic model discussed in Chapter V. 
An OPNET model for a layered video source is given in Appendix A, and the model 
parameters are given in Appendix B. The bit allocation among the layers was set at 
2:1:1. The requested QoS for each connection consists of a maxCTD of 50 ms and a CLP 
of 10'^. Each layer is assigned the same CLR. While the CLR is high for video traffic, 
the value chosen shortens simulation time while still giving a valid demonstration of the 
algorithm’s behavior under different loads. Since the performance of the STEBR 
algorithm with heterogeneous traffic has been presented thoroughly elsewhere [39], only 
the homogenous traffic case is considered here. 
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Figure VLll: Network Scenario. 

The first issue considered was the ability of the QoS filtering algorithms listed in 
Figure VI.4 to shift bandwidth to the higher priority base layer as network load was 
increased to simulate congestion and the corresponding impact on connection throughput. 
The first filtering approach considered was bandwidth sharing. 

The premise of service deferral is sustaining the maximum possible throughput by 
deferring service of a selected cell provided that cell will not expire if not granted 
immediate service. Figure VI. 12 shows the impact of service deferral on the CLR for 
each layer as network load is increased. As long as the base layer is receiving its required 
QoS, all layers are treated in approximately the same manner. As network load increases 
and connections experience CLRs exceeding the required CLR of 10'^, the scheduler 
adapts by denying service to the higher layers whenever possible. However, with service 
deferral, cells from lower priority layers are still granted service unless a higher priority 
cell is present and about to expire. The result is that, while the scheduler violates QoS for 
the base layer last, QoS cannot be maintained over a wide range. Consequently, the gap 
in CLR between the base and enhancement layers stays relatively constant at one order of 
magnitude, and QoS between the enhancement layers is not differentiated at all. 
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Figure VI.12: CLR for Bandwidth Sharing and Service Deferrals. 



The same scenario without service deferral is shown in Figure VI. 13. Now a 
higher priority cell receives priority service if the layer is not receiving its requisite QoS 
regardless of the cell’s position within the queue. The result is that as network load is 
increased, the required QoS for the base layer is maintained regardless of network 
loading, and a clear delineation exits in treatment of the enhancement layers. Comparing 
Figure VI. 13 with Figure VI.12, the bandwidth required to maintain QoS for the base 
layer comes primarily from denying service to layer 2 cells, the second enhancement 
layer as desired. 




Figure VI.13: CLR for Bandwidth Sharing and No Service Deferrals. 



The performance of priority sharing was also considered with and without service 
deferral. With service deferral, the result is identical to Figure VI.12. Service deferral 
renders the cost function irrelevant since the cost function is effectively applied only if 
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the chosen cell is about to expire. Without service deferral, the impact of priority sharing 
is shown in Figure VI. 14. Since the base layer receives priority any time a cell is present, 
the scheduler actually prevents any observable cell loss in the base layer for the network 
loads examined. Once again, the bandwidth required comes at the expense of the second 
enhancement layer as desired. However, the first enhancement layer receives the best 
service out of all three scenarios. 
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Load 
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Figure VI.14: CLR for Priority Sharing and No Service Deferrals. 



Comparing Figures VI. 12 through VI.14, priority sharing without service 
deferrals gives the best performance with respect maintaining or exceeding the QoS for 
the layers in the hierarchical order desired. However, the QoS of the base layer cannot be 
arbitrarily controlled without impacting the connection’s throughput. The throughput for 
each of the scenarios above is shown in Figure VI. 15 and indicates that closer regulation 
of the CLR comes at the price of decreasing throughput for that connection. Given these 
results, the priority algorithm was deemed unsuitable. Since some loss can be tolerated in 
the base layer, as indicated by the QoS parameters supplied as part of the traffic contract, 
the priority sharing algorithm appears unsuitable. The remaining discussion covers only 
the bandwidth sharing algorithm. 
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Figure VI. 15: Throughput under Each Scheduling Scheme. 



The next issue considered is the effect of partial GOB dropping for each of the 
two remaining scenarios. With GOB dropping, throughput is expected to decrease since 
at least part of the traffic allowed through will be unusable at the decoder. The results for 
bandwidth sharing and service deferral are shown in Figure VI. 16. Compared to Figure 
VI. 12, better performance is delivered in terms of CLR for each layer although the 
difference grows successively smaller with increasingly higher network loads. The 
improvement is the most notable with the second enhancement layer. Also a marked 
differentiation in QoS is observed for both of the enhancement layers that did not exist 
prior to GOB dropping. 




Figure VI.16: CLR for Bandwidth Sharing, Service Deferrals, and GOB Dropping. 
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The effect of GOB dropping without service deferrals is shown in Figure VI. 17. 
The scheduler is still able to maintain the requisite CLR for the base layer. The effect on 
the enhancement layers is mixed. Control over the CLR for the first enhancement layer is 
improved relative to Figure VI. 13 at network loads below 0.8. Above this point, CLR 
increases. The CLR for the second enhancement layer is higher regardless of the network 
load. The greater loss, however, results in the improved CLR observed for the first 
enhancement layer at lower network loads. At higher network loads, the impact of cell 
drops from the base layer dominates. Since a cell dropped from a base layer GOB causes 
the first and second layer’s GOBs to be discarded, the CLRs for first and second 
enhancement layers start to converge. 




Figure VI.17: CLR for Bandwidth Sharing, No Service Deferrals, and GOB 

Dropping. 

The impact of GOB dropping on throughput, with and without service deferral, is 
shown in Figure VI. 18. Remarkably, in both cases, only a small decrease is observed in 
throughput and then only at high network loads. However, service deferrals still result in 
higher throughput overall. 

Considering the joint effects of service deferral and GOB dropping on layered 
scheduling, the scheduler is able to more aggressively utilize bandwidth released by 
dropping non-viable cells to improve service for all layers. However, service deferral is 
unable to maintain the requisite CLR for the base layer at high network loads with or 
without GOB dropping. Without service deferral, throughput is impacted since the 
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scheduler gives greater priority to winning cells, which tend to be base layer cells at the 
higher loads. GOB dropping does allow the scheduler to reallocate bandwidth, at the 
expense of the second enhancement layer, to improve the CLR for the first enhancement 
layer at network loads below 0.8 and maintain the required service for the base layer. 




Figure VI.18: Throughput Variation with Partial GOB Dropping. 

Comparing Figure VI. 16 with Figure VI. 17, service deferrals actually produce 
slightly better overall service, as demonstrated by lower CLR for the base layer and 
higher connection throughput, for network loads that maintain the base layer CLR below 
the target CLR. As network load increases, forgoing service deferrals results in better 
service to the base layer in terms of reduced CLR. These results suggest that the most 
effective scheduling scheme is actually a hybrid of the two approaches: use service 
deferrals when the base layer is receiving its required QoS and drop service deferrals 
when the base layer is not receiving its required QoS. 

The final issue examined is how the cells with related GOBs are arranged within 
the cell flow. Each base layer GOB has two associated enhancement layer GOBs. The 
partial GOB dropping algorithm discards upper layer cells whenever a base layer cell is 
discarded. However, the number of cells actually discarded depends on how the cells 
from the individual layers are arranged, concatenated or interleaved in a manner that 
reflects the actual spatial dependency among the cells in the different layers as shown in 
Figure VI.9. The goal is to minimize information loss by only dropping those upper layer 
cells that are rendered unusable by a drop in the base layer. 
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To examine this idea, the bit allocation among the layers was increased to 4:2:2. 
While this is the same relative ratio considered earlier, each layer’s GOB is now doubled 
in size to increase the effect of partial GOB dropping. Two arrangements were 
considered as shown in Figure VI. 19. The first arrangement concatenates cells from each 
layer. The second arrangement interleaves the cells. In either case, base layer GOB 
headers occur every eight cells. 
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Figure VI.19: Cell Arrangements Considered for a 4:2:2 Bit Allocation: a) 

Concatenated or b) Interleaved. 

The effect of each cell arrangement using bandwidth sharing and service deferrals 
is shown in Figure VI.20. For the base and first enhancement layers, interleaving cells 
from different layer GOBs improves CLR consistently regardless of the network load. 

Not surprisingly, the improved CLR comes at the expense of higher CLR for the second 
enhancement layer over the range of network loads examined. However, performance is 
judged unacceptable since, although a clear differentiation in service exists for each layer, 
QoS degrades for each layer at approximately the same rate instead of favoring the base 
layer at the higher network loads. Throughput differences for each case were negligible. 
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Figure VI.20: Relative Affect of Interleaving and Concatenating on CLR with 
Bandwidth Sharing and Service Deferrals. 

The effect of each cell arrangement using bandwidth sharing and service deferrals 
is shown in Figure VI.21. Interleaving gives a similar performance benefit to the one 
discussed in the last paragraph. CLRs are improved for both the base and first 
enhancement layers. There are two notable distinctions between concatenating and 
interleaving. As observed previously, forgoing service deferrals allow the scheduler to 
maintain the requisite QoS for the base layer. By concatenating or interleaving, the same 
is observed on Figure VI.21. However, interleaving still improves CLR by a small 
measure at each network load examined. For the first enhancement layer, unlike previous 
simulations, interleaving allows QoS to be maintained up to a network load of 0.8 
although it increases rapidly beyond this point. Also notable is the observation that 
interleaving improves the CLR for the second enhancement layer up to network loads 
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exceeding 0.8 although performance degrades relative to concatenation after this point. 
Again, throughput differences for each case were negligible. 




Figure VI.21: Relative Affect of Interleaving and Concatenating on CLR with 
Bandwidth Sharing and Without Service Deferrals. 



This chapter presented a scheduling algorithm for layered video traffic based on 
the STEBR algorithm originally proposed by Uziel [39]. The STEBR algorithm provides 
optimal scheduling for heterogeneous traffic, where each connection possibly has 
different CLR and CTD requirements. The hierarchical nature of layered video is 
introduced through a prioritization scheme that denies service to cells from lower priority 
layers during periods of congestion, thereby increasing the probability that cells from 
higher priority layers are transmitted. In this manner, the quality of the reconstructed 
video degrades in a graceful manner than if cells were dropped indiscriminately from the 
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connection. Effective throughput is increased through partial GOB dropping which also 
drops cells determined to be unusable to the decoder. Dropping these cells increases 
scheduling opportunities for viable cells and increases the probability of transmitting 
higher priority GOBs intact. 
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VII. CONCLUSIONS 



A. SUMMARY OF WORK 

Motivated in part by the US Navy’s IT-21 initiative, there htis been considerable 
interest in deploying multimedia applications over tactical networks. Tactical networks 
may be characterized as low bit rate, unreliable, and heterogeneous. Multimedia 
applications, especially those incorporating video, tend to be bandwidth intensive and 
sensitive to transmission errors. Traditional multimedia processing techniques do not 
take these constraints into account. 

This work investigated issues related to distributing low-bit-rate video within the 
context of a teleconferencing application deployed over a tactical ATM network. The 
main objective was to develop mechanisms that support transmission of low-bit-rate 
video streams as a series of scalable layers that progressively improve quality. These 
mechanisms exploit the hierarchical nature of the layered video stream along the 
transmission path from the sender to the recipients to facilitate transmission. 

Specifically, the approach proposed in this dissertation works across the application- 
network interface by coding the video stream into layers, shaping the resulting layered 
video stream prior to entry into the network, and prioritizing service in accordance with 
the relative perceptual importance of each layer. 

A new layered video coding scheme was developed that includes a number of 
original contributions. This work codified some of the design issues required for an 
effective layered coder. How to layer the video stream ejfectively is an elementary design 
issue. To address this, a series of heuristic rules were proposed that lead to effective 
layering structures for motion video via wavelet-based subband decomposition. These 
rules stem from a simple split-and-merge algorithm that uses subband variance as a 
measure of perceptual relevance. By grouping subbands of like variance and assigning 
subbands to layers in order of perceptual importance, the video stream is divided into the 
requisite number of layers. We applied this heuristic rule set and devised a three-layer 
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coding scheme for low-motion video. Employing a common layering scheme for both 
motion video and static presentation slides yielded poor results due to their different 
energy distributions among the subbands and differing perceptual weighting of high 
frequency content. Consequently, we devised a separate scheme in which each layer 
incorporates contributions from all frequency bands. 

A new suboptimal rate control scheme for layered video was developed. Using 
classical rate-distortion approaches, constraining the bit rate for a layered video stream 
using k quantizers involves simultaneously solving k cost functions. In this work, a 
simpler approach replaced the It-dimensional rate-distortion problem with a one- 
dimensional operational rate-distortion curve generated from a set of suboptimal 
quantizer vectors. Rate control is then implemented via a table lookup into a codebook 
containing the suboptimal quantizer vectors. 

The effect of traffic smoothing, prior to network entry, on queuing performance 
and scheduling efficiency was examined. The approach investigated smoothing at three 
time scales; frame, layer, and cell interarrival. Smoothing at the frame level is performed 
by the rate controller and requires no special implementation. Smoothing within the 
frame is accomplished using a leaky-bucket mechanism whose token rate changes each 
frame. Implementations were proposed for transmitting layers over a single VCI and 
multiple VCIs as well as the implications of positioning the leaky bucket prior to the 
ATM layer. 

The problem of prioritizing cell scheduling in layered video traffic was 
investigated to enable a more graceful degradation in received video quality during 
periods of high cell loss. QoS at the connection level is maintained using the STEER 
algorithm originally proposed by Uziel [39]. Within the connection, a prioritization 
scheme denies service to cells from lower priority layers as required to maintain the 
requisite QoS, in terms of cell loss rate, for higher priority layers are transmitted. This 
ensures that reconstructed video quality degrades more gracefully than if cells were 
dropped indiscriminately from the connection. Since the decoder resynchronizes using 
GOB headers following data loss, a cell dropped within a GOB renders any remaining 
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cells in the GOB unusable. We proposed partial GOB dropping to increase effective 
throughput by intelligently discarding related cells deemed unusable that would otherwise 
compete for and waste scheduling opportunities. 

Scheduling at the layer level, in addition to the connection level, requires a means 
for associating cells with layers. Also, partial GOB dropping requires the scheduler to 
have the ability to identify GOB headers within each layer. Two approaches were 
considered. The first approach assigns each layer to a separate VCC using AAL5. This 
approach is the simplest in terms of implementation but requires increased signaling in 
multicast scenarios. The second approach multiplexes each layer across a single VCC 
using AAL2. This approach offers quicker call establishment and minimizes signaling in 
multicast scenarios but requires modification to the CPCS sublayer and does not scale 
beyond four layers. 

B. SUGGESTIONS FOR FUTURE RESEARCH 

The coder as proposed in Chapter IV supports only 8-bit grayscale video. 
Extension to 24-bit color video is a natural step in the maturation of the coder design. 
Video capture usually results in three bit planes, a luminous plane and two color 
difference planes, each with the same resolution as the original frame. Since the HVS is 
more sensitive to variations in luminosity than color, the color planes are normally 
subsampled relative to the luminous plane [6]. With 4:2:2 subsampling, each 16x16 
macroblock in the original frame is represented as a 16x16 luminance macroblock and 
two 8x8 color difference macroblocks. The work presented in Chapter FV applies only to 
the luminance portion of the picture. More research is required to investigate a general 
layering structure for the color difference components. While the frequency 
characteristics of the color components might be expected to mirror those of the 
luminance components, the perceptual importance of those components clearly does not. 
In the quantization matrix suggested for the color components by the JPEG standard, 
little discrimination is made between low and high frequencies, between vertical and 
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horizontal detail [66]. Whether a separate approach is required for the color content of 
static slides also bears consideration. 

One area not fully exploited by the proposed coder is recent advancements in 
entropy coding. One promising area of research is the concept of reversible codes, i.e., 
codes that are uniquely decipherable by parsing forward or backward through the 
bitstream. With a reversible code, the decoder would respond to a stream interruption by 
buffering the bitstream until the next GOB header is located. Then the decoder could 
parse backwards to recover a portion of the corrupted GOB. An interesting analysis 
could focus on the relative benefits of reversible coding and partial GOB dropping since 
the two approaches could not coexist. 

Other issues concerning the coder design that were only partially investigated 
include rate control at the macroblock level and error concealment schemes at the 
decoder. The results presented in Chapter IV only incorporate rate control at the frame 
level in which the quantizer vector is changed solely at the beginning of each new frame. 
Tighter control is possible by implementing rate control at the macroblock level and 
allowing the quantizer vector to change within the frame. The issue is whether changes 
to the quantizer vector within the frame would be distinctly perceptible. The final coder 
issue is implementing error concealment at the decoder. The decoder may use error 
concealment to compensate for incomplete information when reconstructing a frame. A 
simple but effective technique implemented here is zeroth order error concealment. If the 
decoder cannot determine if a macroblock should have been updated, the corresponding 
macroblock in the last frame is used by default. This is particularly effective with low 
motion video. More aggressive approaches to consider would employ prediction or 
interpolation to estimate missing coefficients from adjacent macroblocks. 

The MMRP model appears quite effective at representing VBR video, and the 
associated queuing analysis tools are mature. However, the approach recommended by 
Skelly et al. [14] uniformly quantizes the video stream. Experimentally determined 
histograms demonstrate that video, regardless of the motion content, is decidedly non- 
uniform in distribution [27]. Since MMRP queuing techniques stem from an estimate of 
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the bit rate distribution, an accurate representation of this distribution is essential. Given 
that video is not distributed uniformly, non-uniform quantization schemes bear 
examination to improve the representation for a given number of states. One approach is 
to use Max -Lloyd quantizers [6], or an optimal representation could be developed 
directly from the original histogram. 

The STEBR algorithm provides a powerful, optimal scheduling algorithm for 
CBR and VBR real-time traffic with constraints on CLR and CTD. Two extensions 
appear worth further consideration to extend the algorithm. First, the STEBR algorithm 
makes scheduling decisions based on the past history of each connection and the current 
queue state assuming that no further arrivals take place during the current scheduling slot. 
A possible extension is to modify the cost function to consider the impact of predicted 
near-term arrivals for each connection. Predicting future arrivals requires that the 
scheduler maintain a suitable traffic model for each connection or an aggregate of related 
connections. Modeling bursty sources appears difficult in the context of real-time 
scheduling decisions, as opposed to buffer sizing, but predicting the behavior of 
multiplexed traffic, as typified by the approach taken for VBR video in [95], may prove 
feasible. 

Another worthwhile extension to STEBR is to incorporate the UBR and ABR 
service categories to create a uniform optimal scheduling algorithm. As STEBR is cost- 
based, extension requires construction of a cost-function suitable for each service 
category. For example, UBR connections can be assigned a permanent cost of one, thus 
restricting service unless all other connections are receiving their required QoS. Such an 
assignment appears suitable since UBR connections are assigned unutilized bandwidth 
from CBR and VBR connections. A suitable cost function for ABR is the ratio of MCR 
to instantaneous cell rate granted by the scheduler. However, ABR throughput benefits 
from employing feedback to regulate the sender’s transmission rate both to match 
available bandwidth and to fairly share available bandwidth among all the active ABR 
connections. A scheme for incorporating these mechanisms into the STEBR algorithm 
requires additional consideration. 
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APPENDIX A. OPNET MODEL CODE 



This appendix contains the OPNET process models used to generate the 
simulations results shown in Chapters V and VI. Each process model consists of a finite 
state machine and a series of code segments that implement the behavior required for 
each state. 

A. LAYERED VIDEO SCHEDULER 

The OPNET model for the layered scheduler implements the layered scheduling 
algorithm discussed in Chapter VI. Specifically, STEER is used to select the winning 
connection at the beginning of each service interval, and the CLPRs for each layers are 
filtered and compared to determine the winning layer. The code also implements partial 
GOB dropping as an option. The scheduler assumes that each source transmits using 
only a single VCC (see Figure 11.12). While the code is specifically tailored for the 
homogeneous traffic case, the model is easily extended to heterogeneous traffic by 
storing the connection type with the connection’s VCC and performing QoS filtering if 
the connection is carrying layered video. The finite state machine is shown in Figure 
A.l. 
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Figure A.l: Finite State Machine for Scheduler Process Model. 



1. Header Block 



#include "ams_pk_support .h" 
ttinclude <math.h> 



#define QUEUE_EMPTY 
#define SVC_COMPLETION 
# define ARRIVAL 



(op_q_empty ()) 
op_intrpt_type ( ) 
op_intrpt_type ( ) 



OPC_INTRPT_SELF 
OPC_INTRPT STRM 



#define VCI_BASE 100 

#define MAX_SOURCE 7 

# define MAX_LAYER 3 



#define DROP 1 
#define RETAIN 0 
#define NEWHEADER 1 



void order_queue (int) ; 
int expire_cells (void) ; 



2. State Variable Block 



int 

double 



Objid 

Stathandle 

int 

int 

int 



\server_busy; 

\service_rate ; 

\own_id; 

\clp_handle; 

\cell_count [MAX_SOURCE] ; 

UayerCellCount [MAX_SOURCE] [MAX_LAYER] ; 
\cells_dropped[MAX_SOURCE] ; 
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int 

double 

double 

Stathandle 

int 

Stathandle 

double 

double 

int 

int 

Stathandle 

Stathandle 

Stathandle 


\layerCellsDropped[MAX_SOURCE] [MAX_LAYER] ; 
\clp; 

\pk_svc_time ; 

\cell_handle ; 

\cells_serviced; 

\util_handle ; 

\maxCTD ; 

\maxCLP ; 

\cells_waiting [MAX_SOURCE] ; 
\gobDrop[MAX_SOURCE] [MAX_LAYER] ; 
\clprO_handle ; 

\clprl_handle ; 

\clpr2_handle; 


3. 


Temporary Variable Block 


Packet* 

int 

int 

int 

int 

int 


pkptr; 

insert_ok; 

num_cells; 

ix, jx; 

source_id; 

layer_id; 



AtmT_Cell_Header_Fields* 



int 

int 

double 

int 

double 

double 

double 

double 

int 

int 


atm_hdr_ptr ; 
to tal_arrived ; 
to tal_dropped ; 
ininToE; 
cell_to_send; 
iCLP; 

clpr [MAX_SOURCE] ; 
delta [MAX_SOURCE] ; 
max_clpr; 
winner; 
winningLayer ; 


int 

double 

int* 

int* 

double* 

int 

int 

int 

double 

int 


extra_cells [MAX_SOURCE] ; 

cost [MAX_SOURCE] ; 

service_slot ; 

slotSourcelD; 

cell_cost ; 

q_index ; 

slot ; 

max_index; 
max_cost ; 
done; 


double 

double 

int 

double 

double 

double 


iLayerCLP; 

layerCLPR[MAX_SOURCE] [MAX_LAYER] ; 
layerCellsWaiting [MAX_SOURCE] [MAX_LAYER] 
filteredCost [MAX_LAYER] ; 
filteredCLPR[MAX_LAYER] ; 
max_CLPR; 
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4 . 



Function Block 



expire jcellsQ removes cell from the queue that have expired or as required by the 
partial GOB dropping algorithm. With partial GOB dropping, a flag indicates the status 
for each layer within a connection. An expired cell toggles the flag to “DROP”. If the 
expired cell belongs to the base layer, flags are set to “DROP” for each of the other 
layers. GOB headers reset the flags to “RETAIN”. Partial GOB dropping may be 
disabled by commenting out the lines highlighted in bold. 

int expire_cells (void) 

{ 

int nmn_cells; 
int source_id; 
int layer_id; 
int gobHeader; 
int ix; 

Packet* pkptr; 

A t mT_C e 1 1 _He ade r _F i e 1 ds * a t m_hdr_p t r ; 

/* Find the number of cells in the queue. */ 
num_cells = op_subq_stat ( 0 , OPC_QSTAT_PKSIZE) ; 

/* Remove cells that cannot complete service before expiring, 
starting at the */ 

/* tail of the queue. */ 
ix = 0 ; 

while (ix < num_cells) { 

pkptr = op_subq_pk_access (0, ix); 

op_j)k_nfd_get (pkptr, "header fields", &atm_hdr_ptr ) ; 
source_id = (atm_hdr_ptr~>VCI ~ VCI_BASE) ; 
layer_id = atm_hdr_ptr->PT + 2*atm__hdr_ptr->CLP; 
gobHeader = atm_hdr_ptr->GFC; 

if (gobDrop [source_id] == DROP) { 
if (gobHeader == NEWHEADER) { 

if ( (maxCTD - op_q_wa it _time (pkptr) ) < pk_svc_time) { 
pkptr = op suba Pk remove ( 0 . ix); 
op_pk_des troy (pkptr) ; 
op_prg_mem_f ree (atm_hdr_ptr ) ; 
cells_dropped [source^id] ++; 
layerCellsDropped [ source_id] [ layer_id] ++ ; 
num_cells~- ; 

cel ls_wai ting [sour ce_id] -- ; 

} 

else{ 

if (layer_id == 0) { 

gobDrop [source_id] [0] = RETAIN; 
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gobDrop [ source_id] [1] = RETAIN; 
gobDrop [source_id] [2] = RETAIN; 

) 

else{ 

if (gobDrop [source_id] [0] == RETAIN) { 

gobDrop [source_id] [layer_id] = RETAIN; 

} 

} 

/* Reload the header field struct. */ 
op_pk_nfd_set (pkptr , "header 
fields" , atm_hdr_ptr , op_prg_mem_copy_c reate, \ 

op_prg_mem_free, sizeof (AtmT_Cell_Header_Fields) ) 
ix++ ; 

} 

} 

else{ 

pkptr = op_subg pk remove ( 0 , ix); 
op_pk_destroy (pkptr) ; 
op_prg_mem_f ree ( a tm_hdr_ptr ) ; 
cells_dropped [source_id] ++; 
layerCellsDropped[source_id] [layer_id] ++; 
num_cells — ; 

cells_waiting [source_id] -- ; 

} 

} 

else if ( (maxCTD - op_q_wa it_ time (pkptr) ) < pk_svc_time) { 
pkptr = op sub q p k remove ( 0 , ix); 
op_pk_des troy (pkptr) ; 
op_prg_mem_free (atm_hdr_ptr) ; 
cells_dropped [ source_id] ++ ; 
layerCellsDropped[source_id] [layer_id] ++ ; 
num_cells-- ; 

cells_waiting [source_id] 

gobDrop [source_id] [layer_id] = DROP; 
if (layer_id == 0) { 

gobDrop [sour ce_id] [1] = DROP; 
gobDrop [sour ce_id] [2] = DROP; 

} 

} 

else{ 

/* Reload the header field struct. */ 
op_pk_nf d_set (pkptr, "header 
fields " , atm_hdr_ptr , op_prg_mem_copy_c reate, \ 

op_prg_mem_free, sizeof (AtmT_Cell_Header_Fields) ) ; 

ix++; 

} 

} //while 

return num_cells; 

} 



201 



order_queue() reorders the queue in order of increasing ToE from the head of the 



queue. 

void order_queue ( int num_cells) 

{ 

double *ToE; 
double temp; 
int ix, jx; 
int sorted; 

Packet* pkptr; 

/* Allocate memory for array consisting of ToE entries. */ 
ToE = (double*) op_prg_mem_alloc (num_cel Is* sizeof (double) ) ; 

/* Parse the queue and determine each cell's ToE. */ 
for (ix = 0;ix < num_cells ; ix++ ) { 
pkptr = op subq pk access (0. ix) ; 

ToE[ix] = maxCTD - op_q_wai t_ time (pkptr ) ; 

} 

/* Queue is originally unsorted. */ 
sorted = 0PC_FALSE; 

/* Perform a bubble sort. */ 

for (ix = 0; I (sorted) ScSc ix < (num_cells - 1) ; ix++) { 
sorted = OPC_TRUE; 

for (jx = 0;jx < (num_cells - ix - l);jx++){ 
if (ToE[jx] > ToE[jx+l]){ 
temp = ToE[jx]; 

ToE[jx] = ToE[jx+l]; 

ToE[jx+l] = temp; 

op subq pk_swap ( 0 , j x , j x+ 1 ) ; 

sorted = OPC.FALSE; 

) 

) 

) 

/* Free the memory. */ 
op_prg_mem_f ree (ToE) ; 



5. Init State 

The Init State initializes all statistics and counters and sets the QoS parameters 
required for each connection. Since only homogenous traffic is considered, only a single 
set of parameters is listed. 
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/* initially the server is idle */ 
server_busy = 0; 

/* get queue module's own object id */ 
own_id = op_id_self {); 

/* get assigned value of server processing rate */ 
op_ima_obj_attr_get (own_id, " service_rate" , &service_rate ) ; 

pk_svc_time = 1.0 / service_rate; 

/* Declare local statistics. */ 

clp_handle = op_stat_reg ( "CLP" , OPC_STAT_INDEX_NONE, 0PC_STAT_L0CAL) ; 
cell_handle = op_stat_reg ( "Time" , OPC_STAT_INDEX_NONE, OPC_STAT_LOCAL) ; 
util_handle = 

op_stat_reg( "Utilization" , OPC_STAT_INDEX_NONE, OPC_STAT_LOCAL) ; 
clprO^handle = op_s tat_reg { "CLPRO " , OPC_STAT_INDEX_NONE, OPC_STAT_LOCAL) 
clprl_handle = op_stat_reg { "CLPRl " , OPC_STAT_INDEX__NONE , OPC_STAT_LOCAL) 
clpr2_handle = op_stat_reg { "CLPR2 " , OPC_STAT_INDEX_NONE, OPC_STAT_LOCAL) 

for {ix=0; ix < MAX_SOURCE; ix++) { 
cell_count [ ix] = 0; 
cells_dropped[ix] = 0; 
cells_waiting [ ix] = 0; 
gobDrop [ ix] = RETAIN; 
for {jx=0;jx < MAX_LAYER ; j x+ + ) { 
layerCellCount [ ix] [ jx] = 0; 
layerCellsDropped[ix] [jx] = 0; 

} 

) 

cells_serviced = 0; 

op_stat_write (cell_handle, (double) cell_count [ 0 ] ) ; 

/* Declare the QoS parameters. */ 
maxCTD = 0.050; 
maxCLP = 0.001; 



6. Arrival State 

The Arrival State acquires arriving cells and updates the connection statistics. 
Each cell arrival also triggers recording of the CLP QoS statistic. 

/* acquire the arriving packet */ 

/* multiple arriving streams are supported. */ 
pkptr = op_pk_get (op_intrpt_strm {)); 

/* Get the source ID from the VCI and increment arrival count for the 
source and layer. */ 

op_pk_nfd_get (pkptr , "header fields", &atm_hdr_ptr) ; 
source_id - (atm_hdr_ptr->VCI - VCI_BASE) ; 
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layer_id = atm_hdr_ptr->PT + 2 *atm_hdr_ptr->CLP ; 
cell_count [source_id] ++; 
layerCellCount [source_id] [layer_id] ++; 

/* Reload the header field struct. */ 
op_pk_nfd_set (pkptr , "header 

fields " , atm_hdr_ptr , op_prg_mem_copy_c reate , \ 

op_prg_mem_f ree, sizeof ( AtmT_Cell_Header_Fields ) ) ; 

/* attempt to enqueue the packet at tail of subqueue 0 */ 
if (OP subg pk insert (0, pkptr, 0PC_QP0S_TAIL) != OPC_QINS_OK) 

{ 

/* the insertion failed (due to a full queue) */ 

/* deallocate the packet */ 
op_pk_destroy (pkptr) ; 
cells_dropped [ source_id] ++ ; 
layerCellsDropped[source_id] [layer_id] ++; 

/* set flag indicating insertion fail 

/* this flag is used to determine transition */ 

/* out of this state */ 
insert_ok = 0; 

} 

else{ 

/* insertion was successful */ 
insert_ok = 1; 
cells_waiting [source_id] ++; 

} 

// Capture connection statistics, 
total^arrived = 0; 
total_dropped = 0 ; 

for (ix=0; ix < MAX_SOURCE; ix++) { 
total_arrived += cell_count [ ix] ; 
total_dropped += cells_dropped [ ix] ; 

) 

dp = ( (double) total_dropped) / total_arrived; 

if (op_sim_time ( ) > 0.0) { 

op_stat_write (clp_handle, clp) ; 

) 

if (op_sim_time ( ) > 0.0) { 

op_stat_write (cell_handle, ( (double) total_arrived) /op_sim_time ( ) ) ; 

} 

if (layerCellCount [1] [0] > 0) { 

op_stat_write (clpr0_handle, ( (double) layerCellsDropped [ 1] [0] ) /layerCellC 
ount [1] [0] ) ; 

} 

if ( layerCellCount [ 1] [ 1] > 0){ 

op_stat_write (clprl_handle, ( (double) layerCellsDropped [ 1 ] [1] ) /layerCellC 
ount [ 1 ] [ 1 ] ) ; 

} 
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if (layerCellCount [1] [2] > 0) { 

op_stat_write (clpr2_handle, ( (double) layerCellsDropped[l] [2] ) /layerCellC 
ount[l] [2] ) ; 

} 



7. SVC_Start State 

The SVC_Start state determines which cell to process after removing expired 
cells and discarding cells from corrupted GOBs. STEBR determines the winning 
connection and the winning layer is determined after cost filtering. Service deferral is 
optional. Code segments highlighted in bold text indicate where cost-filtering algorithm 
can be altered and where service deferral may be activated. 

/* In this state, at least one cell may require service. Find the 
number of cells. */ 
num_cells = expire_cells ( ) ; 

/* Sort the queue in descending order of ToE from the tail of the 
queue . * / 

if (num_cells >0) { 
order_queue (num_cells ) ; 

} 

/* Update the CLP ratios and the delta cost. */ 
for (ix=0; ix < MAX_SOURCE; ix++) { 

iCLP = 0.0; 

delta [ix] = 0.0; 

if (cell_count [ ix] > 0) { 

iCLP = ( (double) cells_dropped [ ix] ) /cell_count [ ix] ; 
delta [ix] = 1.0 / (cell_count [ ix] * maxCLP) ; 

} 

clpr[ix] = iCLP/maxCLP; 

/* Update the layer statistics. */ 
for (jx = 0;jx < MAX_LAYER; jx++ ) { 
iLayerCLP = 0.0; 

if (layerCellCount [ix] [jx] > 0) { 
iLayerCLP = 

( (double) layerCellsDropped [ ix] [ jx] ) /layerCellCount [ ix] [ jx] ; 

} 

layerCLPR[ix] [jx] = iLayerCLP/maxCLP; 

) 

) 



/* Initialize the connection cost and extra cell counts. */ 
for (ix=0; ix < MAX^SOURCE; ix++ ) { 
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0 ; 



extra__cells [ ix] = 
cost[ix] = -1.0; 

/* Initialize the current layer count. */ 
for (jx = 0;jx < MAX_LAYER; jx++) { 
layerCellsWaiting[ix] [jx] = 0; 

} 



/* Determine the current 
queue . * / 

if (num_cells > 0) { 

cell_cost = (double*) 
service_slot = (int*) 
slotSourcelD = (int*) 

) 



service slots and cost of each cell in the 



op_prg_mem_alloc (num_cells*sizeof (double) ) 
op_prg_mem_alloc (num_cells*sizeof (int ) ) ; 
op_prg_mem_alloc (num_cells*sizeof (int ) ) ; 



for (ix = 0;ix < num_cells ; ix++) { 
pkptr = op subg pk access (0 , ix) ; 
service_slot [ix] = ( int ) floor ( (maxCTD - 
op_q_wait_time (pkptr) ) /pk_svc_time) ; 

op_pk_nfd_get (pkptr , "header fields", 5catm_hdr__ptr) ; 
source_id = atm_hdr_ptr->VCI - VCI^BASE; 
layer_id = atm_hdr_ptr->PT + 2*atm_hdr_ptr->CLP; 
slotSourcelD [ix] = source_id; 
layerCellsWaiting [source_id] [layer_id] ++ ; 
op_pk__nfd_set (pkptr , "header 
fields " , atm_hdr_ptr, op_prg_mem_copy_create , \ 

op_prg_mem_free, sizeof (AtmT_Cell_Header_Fields) ) ; 
clpr [ source_id] += delta [source_id] ; 
cell_cost [ix] = clpr [source_id] ; 



// STEER starts here! 



/* Grant service! */ 
if (num__cells > 0) { 

/* Work from tail of queue forward to head. */ 
q_index = num_cells - 1; 
done = OPC^FALSE; 

for (slot = service_slot [num_cells-l] ; (slot > 0) ScSc (done ! = 
OPC^TRUE) ;slot--) { 

/* Examine cells in the current time slot. */ 

while ( (q .index >= 0) ScSc (service_slot [q_index] == slot) ) { 

source_id = slotSourcelD [q index] ; 

layer_id = slotLayerlD [q_index] ; 

/* Cost out the source. */ 
i^ (cost [source_id] >= 0){ 
extra_cells [source_id] ++; 
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} 



} 

else{ 

cost [source_id] = cell_cost [q_index] ; 

} 

q_index-“ ; 



/* Determine which connection is granted service in current slot 

*/ 

/* based only on connection costs. 

*/ 

max_cost = cost[0]; 
max_index = 0; 

for (ix = l;ix < MAX_SOURCE; ix++ ) { 
if (cost [ix] > max_cost) { 
max_cost = cost[ix]; 
max_index = ix; 

} 

} 

/* Assign the source to this slot if there are cells available. 

*/ 

if (cost [max_index] >= 0) { 
winner = max_index; 

// Source has only one cell in the interval, 
if (extra_cells [max_index] == 0) { 
cost [max_index] = -1; 

} 

// Source has more than one cell in the interval. 
else{ 

extra_cells [max_index] — ; 

cost [max_index] = cost [max_index] - delta [max_index] ; 

// Load the layer costs. 

for (ix = 0;ix < MAX_LAYER; ix+ + ) { 

f ilteredCost [ix] = layerCost [winner ][ ix] ; 

} 

} 

} 

else if (q_index < 0) { 
done = 0PC_TRUE; 

} 

} //f or 

/* Locate a cell from the winning source. */ 

/* Prune the costs of the winning source. */ 
for (jx = l;jx < MAX_LAYER; jx++) { 

if (layerCellsWaiting [winner] [jx] == 0) { 
layerCLPR [winner] [jx] = 0; 

} 
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} 



/* Find the winning layer from the source. */ 
for (ix = 0;ix < MAX_LAYER; ix++) { 

f il teredCLPR [ix] = layerCLPR [winner ] [ix] ; 

} 

/* Filter the CLPR's to emphasize lower layers. */ 
if (filteredCLPR[0] > 1.0) { 
filteredCLPR[l] = 0.0; 
filteredCLPR[2] = 0.0; 

} 

else if (filteredCLPR[l] > 1.0) { 
fil teredCLPR [2] = 0.0; 

} 

/* Pick the layer with highest CLPR. */ 
winningLayer = 0 ; 
max_CLPR = fil teredCLPR [ 0 ] ; 
for (ix = l;ix < MAX_LAYER; ix++) { 
if ( f ilteredCLPR [ix] > max_CLPR) { 
winningLayer = ix; 
max_CLPR = fil teredCLPR [ ix] ; 

} 

} 

cell_to„send = 0; 

for (ix = 0;ix < num_cells ; ix++) { 

if ( (slotSourceID[ix] == winner) ScSc (slotLayerlD [ix] 
winningLayer ) ) { 

cell_to_send = ix; 
break; 

} 

} 

//Activate service deferral here. 

/* 

if (service_slot [cell_to_send] > 2){ 
for (ix = 0;ix < num_cells; ix++ ) { 
if (slotSourceID[ix] == wixmer) { 
cell_to_send = ix; 
break; 

} 

} 

} 



// Bubble the cell to head of the queue, 
if (cell_to_send > 0) { 

for (ix = cell_to_send; ix > 0;ix — ){ 
op subq pk swap ( 0 , ix, ix-1 ) ; 

} 

} 
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// Grant service to the cell. 

op_intrpt_schedule_self (op_sim_time ( ) + pk_svc_tirae, 0) ; 

// The server is now busy. 
server_busy = 1; 



/* Free memory. */ 
if (num_cells >0) { 

op_prg_mem_free (cell_cost) ; 
op_prg_mem_free (service_slot) ; 
op_prg_mem_free (slotSourcelD) ; 



8. SVC_Complete State 

The SVC_Complete State removes a packet from the queue that has finished transmission. 

/* Cell at the head of the queue */ 

/* is just finishing service */ 

pkptr = op subg pk remove (0, OPC_QPOS_HEAD) ; 

/* Update the source cells waiting count. */ 
op_pk__nfd_get (pkptr , "header fields", &atm_hdr_ptr) ; 
source_id = (atm__hdr_ptr->VCI - VCI_BASE) ; 
op_prg_mem_free (atm_hdr_ptr ) / 
cells_waiting [source_id] ; 

/* forward the packet on stream 0, */ 

/* causing an immediate interrupt at dest. */ 
op_pk_send_f orced (pkptr, 0) ; 

/* server is idle again. */ 
server_busy = 0; 



B. LAYERED VIDEO SOURCE 

The layered video process model represents up to N layered video source using a 
six-state MMRP with a deterministic arrival process. Cells from each layer of a 
particular source are multiplexed over a single VCI. Therefore, each cell is tagged using 
the scheme shown in Table II.3 to identify its parent layer. The finite state machine is 
shown in Figure A.2. 
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Figure A.2: Finite State Machine for a Layered Video Traffic Model. 



1. Header Block 



#include "ams_pk_support.h" 

#define SEND_CELL 0 
#define CHANGE_STATE 100 
# define MAX_SOURCE 7 
# de f i ne MAX_L AYER 3 



#define NEW_STATE 



( ( op_intrpt_type ( ) 
( op_intrpt_code ( ) 



== OPC_INTRPT_SELF) &&\ 
>= CHANGE__STATE) ) 



#define NEW_CELL ( (op_intrpt_type ( ) 

( (op_intrpt_code ( ) 
( op_intrpt„code ( ) 

(MAX_SOURCE+l) *10) ) ) ) 



= = OPC_INTRPT_SELF) 
>= SEND_CELL) &&\ 
<= (SEND_CELL + 






#define INF 9999999999 

# define VCI_BASE 100 
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/* Event code = {Event type) (Source ID) (Layer ID) for 3 decimal digits. 
*/ 

/* Cells are tagged by VCI = VCI + Source^ID 
*/ 

/* Originating layer is indicating by the SDU and CLP bits. 

*/ 

AtmT_Cell_Header_Fields* set_header ( int , int ) ; 



1 . 



State Variable Block 



Obj id 
int 
int 

Distribution 

double 
double 
Stathandle 
Stathandle 
Stathandle 
int 

Evhandle 
int 



\self_id; 

\curr_state [MAX_SOURCE] ; 
\next_state[MAX_SOURCE] ; 

★ * 

\state_dist ; 

\ t r ans i t_t ime ; 

\ interval ; 

\stateO_shandle ; 

\s tatel_shandle ; 
\rate_shandle ; 

\ sources ; 

\cell_intrpt [MAX_SOURCE] ; 
\layer_state[MAX_SOURCE] ; 



3. Temporary Variable Block 

double M[6] [6] = ( { 0 . 000 , 1 . 807 , 0 . 63 6 , 0 . 153 , 0 . 025 , 0 . 000 } , \ 

(1.240, 0.000, 0.288, 0.399, 0.044, 0.022} , \ 
(5.667,0.833,0.000,0.167,0.000,0.000) ,\ 
(2.800,3 .920,0.280,0.000,0.000,0.000) ,\ 
(7.000,0.000,0.000,0.000,0.000,0.000), \ 
{0.000,7.000,0.000,0.000,0.000,0.000}}; 
double lambda[6] = (132.82,232.85,332.87,432.90,532.92,632.95); 

Packet* cell_ptr; 
int ix; 
int jx; 

int source_id; 
int session_id; 
int layer_id; 

A t mT_C e 1 l_He a de r _F i e 1 ds * a t m_hdr _p t r ; 



4. Function Block 

set_header{) creates an ATM cell header structure with the appropriate SDU- and 
CLP-bit tags for the layer. 
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AtmT_Cell_Header_Fields* set_header ( int source_id, int layer_id) 

{ 

AtmT_Cell_Header_Fields* atm_hdr__ptr ; 

// Allocate memory for header fields. 
atm_hdr_ptr = 

(AtmT_Cell_Header_Fields* ) op_prg_mem_alloc (sizeof (AtmT_Cell__Header_Fiel 
ds ) ) ; 

// Load the VCI. 

atm_hdr_^tr“>VCI = VCI_BASE + source_id; 

// Identify the layer, 
switch ( layer_id) { 
case 0 : 

atm_hdr_^tr->PT = 0; 
atm_hdr_ptr->CLP = 0; 
break; 
case 1 : 

atm_hdr_^tr->PT = 1; 
atm_hdr__ptr->CLP = 0; 
break; 
case 2 : 

atm_hdr_ptr->PT = 0; 
atm_hdr__ptr-“>CLP = In- 
break; 

} 

return atm_hdr_ptr; 

} 



5. Init State 

The Init State creates an array of exponential distributions to represents transitions 
between states in the MMRP model. Each source is started arbitrarily in state 0. 

/* get source module's own object id */ 
self_id = op_id_self ( ) ; 

/* get the requested number of multiplexed video sources */ 
op_ima_obj_attr_get (self_id, "Number_of_Sources ", ^sources) ; 

/* allocate space and load distributions */ 
state_dist = 

(Distribution**) (op_prg_mem_alloc (sizeof (Distribution* ) *3 6 ) ) ; 

for ( ix=0 ; ix<6 ; ix++ ) { 
for ( jx=0 ; jx<6 ; jx++ ) { 
if (M[ix] [ jx] >0.0) { 
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state_dist [ix*6+ jx] = 

op_dist_load ( “exponential'' , 1 . 0/M[ix] [ jx] ,0) ; 

} 

else{ 

state_dist [ix*6 + jx] = op_dist_load { "exponential ", INF , 0 ) ; 

} 

} 

} 

/* generate an initial interupt for each source, arbitrarily */ 

/* choosing the 0th state. */ 

for (ix = 0;ix < sources ; ix++ ) { 
next_state [ ix] = 0; 

op_intrpt_schedule_self (op_sim_time { ) , CHANGE_STATE + 10*ix) ; 

} 



6. Transition State 

The Transition State reflects that a source is transitioning between states in the 
MMRP model. The time until the next transition is determined. The arrival rate for that 
source is updated to reflect the current state. 

/* One of the sources is changing state; get the source's id. */ 
session_id = op_intrpt_code ( ) - C HANG E_S TATE; 
source_id = session_id/10 ; 

/* Cancel the pending cell transmission self-interupt for this source. 
*/ 

if (op_ev_valid{cell_intrpt [source_id] ) ) { 
op_ev_cancel (cell_intrpt [source_id] ) ; 

} 

/* Assign the new current state. */ 
curr_state [source_id] = next_state [source_id] ; 

/* Find next state and transition time */ 
next_state [source_id] = 0; 

transit_time = op_dist_outcome (state_dist [curr_state [source_id] *6] ) ; 

/* Search for the shortest time, this is the next state. */ 
for (ix = l;ix < 6;ix++){ 

interval = op_di s t_ou t come { s tat e_dist [curr_state [source_id] *6 + 
ix] ) ; 

if (interval < transit_time) { 
transit_time = interval; 
next_state [source_id] = ix; 

} 

} 
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/* Reset the layer state counter. */ 
layer_state [source_id] = 0; 
layer_id = layer_state [source_id] ; 

/* Create a new formatted ATM cell. */ 
cell_ptr = op_pk_create_fmt ( "ams_atm_cell " ) ; 

/* Allocate memory for the header and assign fields. */ 
atm_hdr_ptr = set_header ( source_id, layer_id) ; 

/* ID the first cell of a GOB */ 
atm_hdr_ptr->GFC = 1; 

/* Load the ATM header and transmit the cell. */ 

op_pk_nfd_set (cell_ptr, "header 

fields" , atm_hdr_ptr, op_prg_mem_copy_c reate, \ 

op_prg_mem_free, sizeof ( AtmT_Cell_Header_Fields) ) ; 
op_pk_send (cell_ptr , 0) ; 

cell_intrpt [source__id] = op_intrpt__schedule_self ( op_sim_time ( ) + 

1 . 0/lambda [curr_state [source_id] ] , \ 

SEND_CELL + 10*source_id) ; 

/* Schedule state transition */ 

op_intrpt_schedule__self (op_sim_time ( ) + transit_time, CHANG E_STATE + 
10*source_id) ; 

7. Send_cell State 

The Send_cell State transmits a new cell and schedules the next departure using 
the current arrival rate. In addition, the state determines the identity of the layer sending 
the cell. Bit allocation among layers, each layer’s GOB length, and the manner of 
interleaving are all set here. 

/* One of the sources is changing state; get the source's id. */ 
session_xd = op_intrpt_code ( ) - SEND_CELL; 
source_id = session_id/10 ; 

/* Determine the layer id. */ 

layer_state [source_id] = ( layer_state [source_id] ++ ) ; 
if ( layer__state [source_id] > 7) { 
layer_state [source_id] = 0; 

) 

switch ( layer_state [source_id] ) { 
case 0: case 1: case 2: case 3: 
layer_id = 0; 
break; 

case 4: case 5: 
layer_id = 1; 
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break; 

case 6: case 7: 
layer_id - 2; 
break; 

} 

/* Create and send an unformatted cell, */ 
cell_ptr = op_pk_create_fmt ( "ams_atm_cell " ) ; 

/* Allocate memory for the header and assign fields. */ 
atm_hdr_ptr = set_header ( source_id, layer_id) ; 

/* ID the first cell of a GOB */ 
if (layer_state [source^id] == 0) { 
atm_hdr_ptr->GFC = 1; 

} 

else { 

atm hdr_ptr->GFC = 0; 

} 

/* Load the ATM header and transmit the cell. */ 

op_pk_nfd_set (cell_ptr, "header 

fields" , atm_hdr_ptr, op_prg_mem_copy_create , \ 

op__prg_mem_free, sizeof (AtmT_Cell_Header_Fields) ) ; 
op_pk_send(cell_ptr , 0) ; 

/* Schedule next cell departure. */ 

cell_intrpt [source_id] = op_intrpt_schedule_self (op_sim_time ( ) + 
1 . 0/lambda [curr_state [source_id] ] , \ 

SEND_CELL + 10*source_id) ; 
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APPENDIX B. MMRP MODEL PARAMETERS 



The MMRP model parameters used in the OPNET simulations were developed 
using the procedure outlined in Section V.B. In accordance with the discussion presented 
in [90], the rate -controlled video trace shown in Figure B.l was quantized to six levels. 
Table B. 1 gives the state distribution vector calculated for the case of six states and the 
associated state arrival rates. Table B.2 gives the associated infinitesimal generating 
function for a frame rate of 10 fps. 




Figure B.l: Rate-controlled VBR Video Sequence. 



State i 


1 


2 


3 


4 


5 


6 


Hi 

Xi (cps) 


0.4136 

132.82 


0.4806 

232.85 


0.0618 

332.87 


0.0379 

432.90 


0.0045 

532.92 


0.0015 

632.95 



Table B.l: State Probabilities and Arrival Rates for Quantized Video Source. 
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M = 



-2.6218 


1.8073 


1.2405 


-1.9937 


5.6667 


0.8333 


2.8000 


3.9200 


7.0000 


0 


0 


7.0000 



0.6364 


0.1527 


0.2880 


0.3987 


-6.6667 


0.1667 


0.2800 


-7.0000 


0 


0 


0 


0 



0.0255 0 

0.0443 0.0222 

0 0 

0 0 

-7.0000 0 

0 -7.0000 



Table B.2: Infinitesimal Generating Function for Quantized Video Source. 

For this sequence, six states give excellent results. Figure B.2 demonstrates how 
closely the MMRP captures the histogram of the original source. Mean bit rate is 
overpredicted but within 1% of the actual mean bit rate. Figure B.3 displays the 
autocorrelation function of both the model and the sequence, illustrating a close match 
over a period of 30 seconds. Figure B.3 also includes the model autocorrelation function 
when seven states are used. The closeness in tracking the autocorrelation function 
depends on how accurately the model predicts the mean bitrate. For this sequence, using 
7 states gives a worse overprediction of the mean bitrate, thereby leading to the bias 
displayed in tracking the autocorrelation function. Increasing the number of states did 
not guarantee better results until a prohibitively large number of states were employed. 




Figure B.2: Predicted and Actual Histograms. 
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Figure B.3: Actual and Predicted Autocorrelation Functions. 
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